Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Founder and Scientific Advisor, Leadership Team
Research Topics
Causality
Computational Neuroscience
Deep Learning
Generative Models
Graph Neural Networks
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Natural Language Processing
Probabilistic Models
Reasoning
Recurrent Neural Networks
Reinforcement Learning
Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Marie-Josée Beauchamp, Administrative Assistant at marie-josee.beauchamp@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Collaborating Alumni - McGill University
Collaborating Alumni - Université de Montréal
Collaborating researcher - Cambridge University
Principal supervisor :
PhD - Université de Montréal
Independent visiting researcher
Co-supervisor :
PhD - Université de Montréal
Independent visiting researcher
Principal supervisor :
Collaborating researcher - N/A
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher - KAIST
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
Co-supervisor :
Independent visiting researcher
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating Alumni
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Ying Wu Coll of Computing
Collaborating researcher - University of Waterloo
Principal supervisor :
Collaborating Alumni - Max-Planck-Institute for Intelligent Systems
Collaborating researcher - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Independent visiting researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating Alumni - Université de Montréal
Postdoctorate
Co-supervisor :
Independent visiting researcher - Technical University of Munich
PhD - Université de Montréal
Co-supervisor :
Independent visiting researcher
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - McGill University
Principal supervisor :

Publications

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning
Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environment… (see more)s. To facilitate research addressing this problem, we proposeCausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.
Predicting Infectiousness for Proactive Contact Tracing
Prateek Gupta
Nasim Rahaman
Hannah Alsdurf
gaetan caron
satya ortiz gagne
Bernhard Schölkopf … (see 3 more)
Abhinav Sharma
Andrew Robert Williams
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdo… (see more)wns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.
RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs
Louis-Pascal Xhonneux
This paper studies learning logic rules for reasoning on knowledge graphs. Logic rules provide interpretable explanations when used for pred… (see more)iction as well as being able to generalize to other tasks, and hence are critical to learn. Existing methods either suffer from the problem of searching in a large search space (e.g., neural logic programming) or ineffective optimization due to sparse rewards (e.g., techniques based on reinforcement learning). To address these limitations, this paper proposes a probabilistic model called RNNLogic. RNNLogic treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. We develop an EM-based algorithm for optimization. In each iteration, the reasoning predictor is updated to explore some generated logic rules for reasoning. Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step. Experiments on four datasets prove the effectiveness of RNNLogic.
Spatially Structured Recurrent Modules
Nasim Rahaman
Muhammad Waleed Gondal
Manuel Wüthrich
Yash Sharma
Bernhard Schölkopf
Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalise we… (see more)ll and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. To this end, we model the dynamical system as a collection of autonomous but sparsely interacting sub-systems that interact according to a learned topology which is informed by the spatial structure of the underlying system. This gives rise to a class of models that are well suited for capturing the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modelling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalisation to novel tasks without additional training than strong baselines that perform equally well or better on the training distribution.
Attention Based Pruning for Shift Networks
In many application domains such as computer vision, Convolutional Layers (CLs) are key to the accuracy of deep learning methods. However, i… (see more)t is often required to assemble a large number of CLs, each containing thousands of parameters, in order to reach state-of-the-art accuracy, thus resulting in complex and demanding systems that are poorly fitted to resource-limited devices. Recently, methods have been proposed to replace the generic convolution operator by the combination of a shift operation and a simpler
An Analysis of the Adaptation Speed of Causal Models
C AUSAL R: Causal Reasoning over Natural Language Rulebases
Jason Weston
Sumit Chopra
Thomas Wolf
Lysandre Debut
Julien Victor Sanh
Clement Chaumond
Anthony Delangue
Pier-339 Moi
Tim ric Cistac
R´emi Rault
Morgan Louf
Funtow-900 Joe
Sam Davison
Patrick Shleifer
Von Platen
Clara Ma
Yacine Jernite
Julien Plu
Canwen Xu … (see 6 more)
Zhilin Yang
Peng Qi
William W Cohen
Russ Salakhutdinov
Transformers have been shown to be able to 001 perform deductive reasoning on a logical rule-002 base containing rules and statements writte… (see more)n 003 in natural language. Recent works show that 004 such models can also produce the reasoning 005 steps (i.e., the proof graph ) that emulate the 006 model’s logical reasoning process. But these 007 models behave as a black-box unit that emu-008 lates the reasoning process without any causal 009 constraints in the reasoning steps, thus ques-010 tioning the faithfulness. In this work, we frame 011 the deductive logical reasoning task as a causal 012 process by defining three modular components: 013 rule selection, fact selection, and knowledge 014 composition. The rule and fact selection steps 015 select the candidate rule and facts to be used 016 and then the knowledge composition combines 017 them to generate new inferences. This ensures 018 model faithfulness by assured causal relation 019 from the proof step to the inference reasoning. 020 To test our causal reasoning framework, we 021 propose C AUSAL R where the above three com-022 ponents are independently modeled by trans-023 formers. We observe that C AUSAL R is robust 024 to novel language perturbations, and is com-025 petitive with previous works on existing rea-026 soning datasets. Furthermore, the errors made 027 by C AUSAL R are more interpretable due to 028 the multi-modular approach compared to black-029 box generative models. 1 030
BabyAI 1.1
David Y. T. Hui
Maxime Chevalier-Boisvert
BabyAI 1.1
David Y. T. Hui
Maxime Chevalier-Boisvert
The BabyAI platform is designed to measure the sample efficiency of training an agent to follow grounded-language instructions. BabyAI 1.0 … (see more)presents baseline results of an agent trained by deep imitation or reinforcement learning. BabyAI 1.1 improves the agent’s architecture in three minor ways. This increases reinforcement learning sample efficiency by up to 3 × and improves imitation learning performance on the hardest level from 77% to 90 . 4% . We hope that these improvements increase the computational efficiency of BabyAI experiments and help users design better agents.
CAMAP: Artificial neural networks unveil the role of 1 codon arrangement in modulating MHC-I peptides 2 presentation
Tariq Daouda
Maude Dumont-Lagacé
Albert Feghaly
Yahya Benslimane
6. Rébecca
Panes
Mathieu Courcelles
Mohamed Benhammadi
Lea Harrington
Pierre Thibault
François Major
Étienne Gagnon
Claude Perreault
30 MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and 31 neoplastic cells by CD8 T cells. However… (see more), accurately predicting the MAP repertoire remains 32 difficult, because only a fraction of the transcriptome generates MAPs. In this study, we 33 investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We 34 developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), 35 predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons 36 (MCCs), while excluding the MCC per se . CAMAP predictions were significantly more accurate 37 when using original codon sequences than shuffled codon sequences which reflect amino acid 38 usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity 39 to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, 40 transcript expression level and CAMAP scores was particularly useful to increaser MAP prediction 41 accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions 42 flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation 43 of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon 44 arrangement in the regulation of MAP presentation and support integration of both translational 45 and post-translational events in predictive algorithms to ameliorate modeling of the 46 immunopeptidome. 47 48 49 they modulated the levels of SIINFEKL presentation in both constructs, but enhanced translation efficiency could only be detected for OVA-RP. These data show that codon arrangement can modulate MAP presentation strength without any changes in the amino
CAMAP: Artificial neural networks unveil the role of 1 codon arrangement in modulating MHC-I peptides 2 presentation discovery of minor histocompatibility with
Tariq Daouda
Maude Dumont-Lagacé
Albert Feghaly
Yahya Benslimane
6. Rébecca
Panes
Mathieu Courcelles
Mohamed Benhammadi
Lea Harrington
Pierre Thibault
François Major
Étienne Gagnon
Claude Perreault
30 MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and 31 neoplastic cells by CD8 T cells. However… (see more), accurately predicting the MAP repertoire remains 32 difficult, because only a fraction of the transcriptome generates MAPs. In this study, we 33 investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We 34 developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), 35 predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons 36 (MCCs), while excluding the MCC per se . CAMAP predictions were significantly more accurate 37 when using original codon sequences than shuffled codon sequences which reflect amino acid 38 usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity 39 to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, 40 transcript expression level and CAMAP scores was particularly useful to increaser MAP prediction 41 accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions 42 flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation 43 of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon 44 arrangement in the regulation of MAP presentation and support integration of both translational 45 and post-translational events in predictive algorithms to ameliorate modeling of the 46 immunopeptidome. 47 48 49 they modulated the levels of SIINFEKL presentation in both constructs, but enhanced translation efficiency could only be detected for OVA-RP. These data show that codon arrangement can modulate MAP presentation strength without any changes in the amino
A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during plan… (see more)ning. The agent uses a bottleneck mechanism over a set-based representation to force the number of entities to which the agent attends at each planning step to be small. In experiments, we investigate the bottleneck mechanism with several sets of customized environments featuring different challenges. We consistently observe that the design allows the planning agents to generalize their learned task-solving abilities in compatible unseen environments by attending to the relevant objects, leading to better out-of-distribution generalization performance.