Yoshua Bengio

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Cassidy MacNeil, adjointe principale et responsable des opérations cassidy.macneil@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Collaborateur·rice alumni - McGill

Mohammed Abukalam

Collaborateur·rice alumni - UdeM

Berkes Anaïs

Collaborateur·rice de recherche - Cambridge University

Superviseur⋅e principal⋅e :

Rim Assouel

Doctorat - UdeM

Stefan Bauer

Visiteur de recherche indépendant

Co-superviseur⋅e :

Guillaume Lajoie

Paul Bertin

Doctorat - UdeM

Joyce Chai

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Siva Reddy

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche - KAIST

Collaborateur·rice alumni - UdeM

Doctorat - UdeM

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Loubna Benabbou

Desmond Elliott

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

Leo Feng

Doctorat - UdeM

Doctorat

Doctorat - UdeM

Edward Hu

Doctorat - UdeM

Moksh Jain

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - UdeM

Hyeonah Kim

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Alex Hernandez-Garcia

Salem Lahlou

Collaborateur·rice alumni - UdeM

Tabitha Edith Lee

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Zhen Liu

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Liam Paull

Chenghao Liu

Collaborateur·rice alumni

Doctorat - UdeM

Collaborateur·rice alumni - UdeM

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

Sarthak Mittal

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant - UdeM

Padideh Nouri

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Ali Parviz

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Lena Podina

Collaborateur·rice de recherche - University of Waterloo

Superviseur⋅e principal⋅e :

David Rolnick

Nassim Rahaman

Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Collaborateur·rice de recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Postdoctorat - UdeM

Visiteur de recherche indépendant - UdeM

Oli RICHARDSON

Postdoctorat - UdeM

Camille Rochefort-Boulanger

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Julie Hussin

Abhik Roychoudhury Roychoudhury

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Collaborateur·rice alumni - UdeM

Marcin Sendera

Collaborateur·rice alumni - UdeM

Divya Sharma

Postdoctorat

Co-superviseur⋅e :

Alex Hernandez-Garcia

Mélisande Astrid Crystal Teng

Doctorat - UdeM

Co-superviseur⋅e :

Hugo Larochelle

Ivan Titov

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Siva Reddy

Alex Tong

Collaborateur·rice alumni - UdeM

Postdoctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche

Collaborateur·rice de recherche - UdeM

Doctorat - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Aaron Courville

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Harry Zhao

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

Scaling in the service of reasoning & model-based ML

4 avril 2023

par

Yoshua Bengio

Edward J. Hu

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

par

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

Les réseaux de flot génératifs

15 mars 2022

par

Yoshua Bengio

Publications

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

Alex Lamb

Anirudh Goyal

A. Slowik

Michael Curtis Mozer

Philippe Beaudoin

Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (voir plus)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

A Two-Stream Continual Learning System With Variational Domain-Agnostic Feature Replay

Qicheng Lao

Xiang Jiang

Mohammad Havaei

Learning in nonstationary environments is one of the biggest challenges in machine learning. Nonstationarity can be caused by either task dr… (voir plus)ift, i.e., the drift in the conditional distribution of labels given the input data, or the domain drift, i.e., the drift in the marginal distribution of the input data. This article aims to tackle this challenge with a modularized two-stream continual learning (CL) system, where the model is required to learn new tasks from a support stream and adapted to new domains in the query stream while maintaining previously learned knowledge. To deal with both drifts within and across the two streams, we propose a variational domain-agnostic feature replay-based approach that decouples the system into three modules: an inference module that filters the input data from the two streams into domain-agnostic representations, a generative module that facilitates the high-level knowledge transfer, and a solver module that applies the filtered and transferable knowledge to solve the queries. We demonstrate the effectiveness of our proposed approach in addressing the two fundamental scenarios and complex scenarios in two-stream CL.

2021-03-03

IEEE Transactions on Neural Networks and Learning Systems (publié)

Towards Causal Representation Learning

Bernhard Schölkopf

Francesco Locatello

Stefan Bauer

Nan Rosemary Ke

Nal Kalchbrenner

Anirudh Goyal

The two fields of machine learning and graphical causality arose and developed separately. However, there is now cross-pollination and incre… (voir plus)asing interest in both fields to benefit from the advances of the other. In the present paper, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

2021-02-22

ArXiv (prépublication)

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing Its Gradient Estimator Bias

Axel Laborieux

Maxence Ernoult

Benjamin Scellier

Julie Grollier

Damien Querlioz

2021-02-18

Frontiers in Neuroscience (publié)

Structured Sparsity Inducing Adaptive Optimizers for Deep Learning

Tristan Deleu

The parameters of a neural network are naturally organized in groups, some of which might not contribute to its overall performance. To prun… (voir plus)e out unimportant groups of parameters, we can include some non-differentiable penalty to the objective function, and minimize it using proximal gradient methods. In this paper, we derive the weighted proximal operator, which is a necessary component of these proximal methods, of two structured sparsity inducing penalties. Moreover, they can be approximated efficiently with a numerical solver, and despite this approximation, we prove that existing convergence guarantees are preserved when these operators are integrated as part of a generic adaptive proximal method. Finally, we show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns, on representative examples from computer vision and natural language processing.

2021-02-07

ArXiv (prépublication)

Using Artificial Intelligence to Visualize the Impacts of Climate Change

Alexandra Luccioni

Victor Schmidt

Vahe Vardanyan

T. Rhyne

Public awareness and concern about climate change often do not match the magnitude of its threat to humans and our environment. One reason f… (voir plus)or this disagreement is that it is difficult to mentally simulate the effects of a process as complex as climate change and to have a concrete representation of the impact that our individual actions will have on our own future, especially if the consequences are long term and abstract. To overcome these challenges, we propose to use cutting-edge artificial intelligence (AI) approaches to develop an interactive personalized visualization tool, the AI climate impact visualizer. It will allow a user to enter an address—be it their house, their school, or their workplace—-and it will provide them with an AI-imagined possible visualization of the future of this location in 2050 following the detrimental effects of climate change such as floods, storms, and wildfires. This image will be accompanied by accessible information regarding the science behind climate change, i.e., why extreme weather events are becoming more frequent and what kinds of changes are happening on a local and global scale.

2021-02-01

IEEE Computer Graphics and Applications (publié)

Training neural networks to recognize speech increased their correspondence to the human auditory pathway but did not yield a shared hierarchy of acoustic features

Jessica A.F. Thompson

Elia Formisano

Marc Schönwiesner

The correspondence between the activity of artificial neurons in convolutional neural networks (CNNs) trained to recognize objects in images… (voir plus) and neural activity collected throughout the primate visual system has been well documented. Shallower layers of CNNs are typically more similar to early visual areas and deeper layers tend to be more similar to later visual areas, providing evidence for a shared representational hierarchy. This phenomenon has not been thoroughly studied in the auditory domain. Here, we compared the representations of CNNs trained to recognize speech (triphone recognition) to 7-Tesla fMRI activity collected throughout the human auditory pathway, including subcortical and cortical regions, while participants listened to speech. We found no evidence for a shared representational hierarchy of acoustic speech features. Instead, all auditory regions of interest were most similar to a single layer of the CNNs: the first fully-connected layer. This layer sits at the boundary between the relatively task-general intermediate layers and the highly task-specific final layers. This suggests that alternative architectural designs and/or training objectives may be needed to achieve fine-grained layer-wise correspondence with the human auditory pathway. Highlights Trained CNNs more similar to auditory fMRI activity than untrained No evidence of a shared representational hierarchy for acoustic features All ROIs were most similar to the first fully-connected layer CNN performance on speech recognition task positively associated with fmri similarity

2021-01-27

bioRxiv (prépublication)

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Manuel Wüthrich

Bernhard Schölkopf

Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environment… (voir plus)s. To facilitate research addressing this problem, we proposeCausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

2021-01-12

ICLR.cc/2021/Conference (poster)

Predicting Infectiousness for Proactive Contact Tracing

Prateek Gupta

Nasim Rahaman

Pierre-Luc St-Charles

Hannah Alsdurf

gaetan caron

satya ortiz gagne

Bernhard Schölkopf … (voir 3 de plus)

Abhinav Sharma

Jian Tang

Andrew Robert Williams

The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdo… (voir plus)wns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.

2021-01-12

ICLR.cc/2021/Conference (spotlight)

RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs

Meng Qu

Junkun Chen

Louis-Pascal Xhonneux

Jian Tang

This paper studies learning logic rules for reasoning on knowledge graphs. Logic rules provide interpretable explanations when used for pred… (voir plus)iction as well as being able to generalize to other tasks, and hence are critical to learn. Existing methods either suffer from the problem of searching in a large search space (e.g., neural logic programming) or ineffective optimization due to sparse rewards (e.g., techniques based on reinforcement learning). To address these limitations, this paper proposes a probabilistic model called RNNLogic. RNNLogic treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. We develop an EM-based algorithm for optimization. In each iteration, the reasoning predictor is updated to explore some generated logic rules for reasoning. Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step. Experiments on four datasets prove the effectiveness of RNNLogic.

2021-01-12

ICLR.cc/2021/Conference (poster)

Spatially Structured Recurrent Modules

Nasim Rahaman

Anirudh Goyal

Muhammad Waleed Gondal

Manuel Wüthrich

Stefan Bauer

Yash Sharma

Bernhard Schölkopf

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalise we… (voir plus)ll and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. To this end, we model the dynamical system as a collection of autonomous but sparsely interacting sub-systems that interact according to a learned topology which is informed by the spatial structure of the underlying system. This gives rise to a class of models that are well suited for capturing the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modelling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalisation to novel tasks without additional training than strong baselines that perform equally well or better on the training distribution.

2021-01-12

ICLR.cc/2021/Conference (poster)

Attention Based Pruning for Shift Networks

Ghouthi Boukli hacene

Carlos Lassance

Vincent Gripon

Matthieu Courbariaux

In many application domains such as computer vision, Convolutional Layers (CLs) are key to the accuracy of deep learning methods. However, i… (voir plus)t is often required to assemble a large number of CLs, each containing thousands of parameters, in order to reach state-of-the-art accuracy, thus resulting in complex and demanding systems that are poorly fitted to resource-limited devices. Recently, methods have been proposed to replace the generic convolution operator by the combination of a shift operation and a simpler

2021-01-10

2020 25th International Conference on Pattern Recognition (ICPR) (publié)