Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Founder and Scientific Advisor, Leadership Team
Research Topics
Causality
Computational Neuroscience
Deep Learning
Generative Models
Graph Neural Networks
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Natural Language Processing
Probabilistic Models
Reasoning
Recurrent Neural Networks
Reinforcement Learning
Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Marie-Josée Beauchamp, Administrative Assistant at marie-josee.beauchamp@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Collaborating Alumni - McGill University
Collaborating Alumni - Université de Montréal
Collaborating researcher - Cambridge University
Principal supervisor :
PhD - Université de Montréal
Independent visiting researcher - KAIST
Independent visiting researcher
Co-supervisor :
PhD - Université de Montréal
Collaborating researcher - N/A
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher - KAIST
PhD - Université de Montréal
PhD - Université de Montréal
Research Intern - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating researcher - Université de Montréal
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Collaborating Alumni
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Ying Wu Coll of Computing
PhD - University of Waterloo
Principal supervisor :
Collaborating Alumni - Max-Planck-Institute for Intelligent Systems
Research Intern - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Collaborating Alumni - Université de Montréal
Master's Research - Université de Montréal
Postdoctorate
Independent visiting researcher - Technical University of Munich
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Research Intern - Université de Montréal
PhD - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :

Publications

(Private)-Retroactive Carbon Pricing [(P)ReCaP]: A Market-based Approach for Climate Finance and Risk Assessment
Prateek Arun Gupta
Dylan Radovic
M. Scholl
Andrew Robert Williams
C. S. D. Witt
Yang Zhang
RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software
Cheng-Hao Liu
Maksym Korablyov
Stanisław Jastrzębski
Paweł Włodarczyk-Pruszyński
Marwin Segler
E VALUATING G ENERALIZATION IN GF LOW N ETS FOR M OLECULE D ESIGN
Andrei Cristian Nica
Moksh J. Jain
Cheng-Hao Liu
Maksym Korablyov
Michael M. Bronstein
Deep learning bears promise for drug discovery problems such as de novo molecular design. Generating data to train such models is a costly a… (see more)nd time-consuming process, given the need for wet-lab experiments or expensive simulations. This problem is compounded by the notorious data-hungriness of machine learning algorithms. In small molecule generation the recently proposed GFlowNet method has shown good performance in generating diverse high-scoring candidates, and has the interesting advantage of being an off-policy offline method. Finding an appropriate generalization evaluation metric for such models, one predictive of the desired search performance (i.e. finding high-scoring diverse candidates), will help guide online data collection for such an algorithm. In this work, we develop techniques for evaluating GFlowNet performance on a test set, and identify the most promising metric for predicting generalization. We present empirical results on several small-molecule design tasks in drug discovery, for several GFlowNet training setups, and we find a metric strongly correlated with diverse high-scoring batch generation. This metric should be used to identify the best generative model from which to sample batches of molecules to be evaluated.
Inductive Biases for Relational Tasks
Current deep learning approaches have shown good in-distribution performance but struggle in out-of-distribution settings. This is especiall… (see more)y true in the case of tasks involving abstract relations like recognizing rules in sequences, as required in many intelligence tests. In contrast, our brains are remarkably flexible at such tasks, an attribute that is likely linked to anatomical constraints on computations. Inspired by this, recent work has explored how enforcing that relational representations remain distinct from sensory representations can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by ``partitioned'' representations of relations and sensory details. We investigate inductive biases that ensure abstract relations are learned and represented distinctly from sensory data across several neural network architectures and show that they outperform existing architectures on out-of-distribution generalization for various relational tasks. These results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing relational computations.
Object-centric Compositional Imagination for Visual Abstract Reasoning
Pau Rodriguez
Perouz Taslakian
David Vazquez
Like humans devoid of imagination, current machine learning systems lack the ability to adapt to new, unexpected situations by foreseeing th… (see more)em, which makes them unable to solve new tasks by analogical reasoning. In this work, we introduce a new compositional imagination framework that improves a model's ability to generalize. One of the key components of our framework is object-centric inductive biases that enables models to perceive the environment as a series of objects, properties, and transformations. By composing these key ingredients, it is possible to generate new unseen tasks that, when used to train the model, improve generalization. Experiments on a simplified version of the Abstraction and Reasoning Corpus (ARC) demonstrate the effectiveness of our framework.
A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions
Francois St-Hilaire
Dung D. Vu
Antoine Frau
Nathan J. Burns
Farid Faraji
Joseph Potochny
Stephane Robert
Arnaud Roussel
Selene Zheng
Taylor Glazier
Junfel Vincent Romano
Robert Belfer
Muhammad Shayan
Ariella Smofsky
Tommy Delarosbil
Seulmin Ahn
Simon Eden-Walker
Kritika Sony
Ansona Onyi Ching
Sabina Elkins … (see 11 more)
A. Stepanyan
Adela Matajova
Victor Chen
Hossein Sahraei
Robert Larson
N. Markova
Andrew Barkett
Iulian V. Serban
Ekaterina Kochmar
CACHE (Critical Assessment of Computational Hit-finding Experiments): A public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding
Suzanne Ackloo
R. Al-Awar
Rommie Elizabeth Amaro
C. Arrowsmith
Hatylas F. Z. Azevedo
R. Batey
U. Betz
Cristian G. Bologa
J. Chodera
Wendy Cornell
Ian Dunham
G. Ecker
Kristina Edfeldt
A. Edwards
M. Gilson
Cláudia Regina Gordijo
G. Hessler
Alexander Hillisch
Anders C Hogner … (see 19 more)
John Joseph Irwin
J. Jansen
Daniel Kuhn
Andrew R. Leach
Alpha A. Lee
Uta F. Lessel
J. Moult
Ingo Muegge
Tudor I. Oprea
Ben Perry
Patrick F. Riley
K. Saikatendu
Vijayaratnam Santhakumar
Matthieu Schapira
Cora Scholten
M. Todd
Masoud Vedadi
Andrea Volkamer
T. Willson
RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro
Deepak Sharma
Thomas Gaudelet
Andrew Anighoro
Torsten Gross
Francisco Martínez-Peña
Eileen L. Tang
S. SurajM
Cristian Regep
Jeremy B.R. Hayter
Maksym Korablyov
N. Valiante
Almer M. van der Sloot
Mike Tyers
Charles E.S. Roberts
Michael M. Bronstein
Luke Lee Lairson
Jake P. Taylor-King
Tackling Climate Change with Machine Learning
Priya L. Donti
Lynn H. Kaack
Kelly Kochanski
Alexandre Lacoste
Kris Sankaran
Andrew Slavin Ross
Nikola Milojevic-Dupont
Natasha Jaques
Anna Waldman-Brown
Alexandra Luccioni
Evan David Sherwin
S. Karthik Mukkavilli
Konrad Paul Kording
Carla P. Gomes
Andrew Y. Ng
Demis Hassabis
John C. Platt
Felix Creutzig … (see 2 more)
Jennifer T Chayes
Climate change is one of the greatest challenges facing humanity, and we, as machine learning (ML) experts, may wonder how we can help. Here… (see more) we describe how ML can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by ML, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the ML community to join the global effort against climate change.
ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods
Victor Schmidt
Alexandra Luccioni
Mélisande Teng
Alexia Reynaud
Sunand Raghupathi
Gautier Cosne
Adrien Juraver
Vahe Vardanyan
Alex Hernandez-Garcia
Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both poli… (see more)cy-making and individual behaviour. However, taking action requires understanding the effects of climate change, even though they may seem abstract and distant. Projecting the potential consequences of extreme climate events such as flooding in familiar places can help make the abstract impacts of climate change more concrete and encourage action. As part of a larger initiative to build a website that projects extreme climate events onto user-chosen photos, we present our solution to simulate photo-realistic floods on authentic images. To address this complex task in the absence of suitable training data, we propose ClimateGAN, a model that leverages both simulated and real data for unsupervised domain adaptation and conditional image generation. In this paper, we describe the details of our framework, thoroughly evaluate components of our architecture and demonstrate that our model is capable of robustly generating photo-realistic flooding.
Continuous-Time Meta-Learning with Forward Mode Differentiation
Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learni… (see more)ng (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differential equation (ODE). Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous, as opposed to a fixed and discrete number of gradient steps. As a consequence, we can optimize the amount of adaptation necessary to solve a new task using stochastic gradient descent, in addition to learning the initial conditions as is standard practice in gradient-based meta-learning. Importantly, in order to compute the exact meta-gradients required for the outer-loop updates, we devise an efficient algorithm based on forward mode differentiation, whose memory requirements do not scale with the length of the learning trajectory, thus allowing longer adaptation in constant memory. We provide analytical guarantees for the stability of COMLN, we show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
Coordination Among Neural Modules Through a Shared Global Workspace
Anirudh Goyal
Aniket Rajiv Didolkar
Alex Lamb
Kartikeya Badola
Nan Rosemary Ke
Nasim Rahaman
Jonathan Binas
Charles Blundell
Michael Curtis Mozer
Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For exam… (see more)ple, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorporate information from other positions and object-centric architectures make use of graph neural networks to model interactions among entities. We consider how to improve on pairwise interactions in terms of global coordination and a coherent, integrated representation that can be used for downstream tasks. In cognitive science, a global workspace architecture has been proposed in which functionally specialized components share information through a common, bandwidth-limited communication channel. We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments. The proposed method includes a shared workspace through which communication among different specialist modules takes place but due to limits on the communication bandwidth, specialist modules must compete for access. We show that capacity limitations have a rational basis in that (1) they encourage specialization and compositionality and (2) they facilitate the synchronization of otherwise independent specialists.