Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Founder and Scientific Advisor, Leadership Team
Research Topics
Causality
Computational Neuroscience
Deep Learning
Generative Models
Graph Neural Networks
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Natural Language Processing
Probabilistic Models
Reasoning
Recurrent Neural Networks
Reinforcement Learning
Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Marie-Josée Beauchamp, Administrative Assistant at marie-josee.beauchamp@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Collaborating Alumni - McGill University
Collaborating Alumni - Université de Montréal
Collaborating researcher - Cambridge University
Principal supervisor :
PhD - Université de Montréal
Independent visiting researcher
Co-supervisor :
PhD - Université de Montréal
Collaborating researcher - N/A
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher - KAIST
PhD - Université de Montréal
PhD - Université de Montréal
Research Intern - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating Alumni
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Ying Wu Coll of Computing
PhD - University of Waterloo
Principal supervisor :
Collaborating Alumni - Max-Planck-Institute for Intelligent Systems
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Collaborating Alumni - Université de Montréal
Master's Research - Université de Montréal
Postdoctorate
Independent visiting researcher - Technical University of Munich
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Université de Montréal
Collaborating researcher
Collaborating researcher - KAIST
PhD - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :

Publications

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL
Akram Erraqabi
Marlos C. Machado
Harry Zhao
Mingde Zhao
Sainbayar Sukhbaatar
Alessandro Lazaric
Ludovic Denoyer
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from… (see more) skill discovery to reward shaping. Recently, learning the Laplacian representation has been framed as the optimization of a temporally-contrastive objective to overcome its computational limitations in large (or continuous) state spaces. However, this approach requires uniform access to all states in the state space, overlooking the exploration problem that emerges during the representation learning process. In this work, we propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation. We do so by combining the representation learning with a skill-based covering policy, which provides a better training distribution to extend and refine the representation. We also show that a simple augmentation of the representation objective with the learned temporal abstractions improves dynamics-awareness and helps exploration. We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments. Finally, even if our method is not optimized for skill discovery, the learned skills can successfully solve difficult continuous navigation tasks with sparse rewards, where standard skill discovery approaches are no so effective.
FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for Federated Learning on Non-IID Data
Mike He Zhu
Lena Nehale Ezzine
Dianbo Liu
(Private)-Retroactive Carbon Pricing [(P)ReCaP]: A Market-based Approach for Climate Finance and Risk Assessment
Prateek Arun Gupta
Dylan Radovic
M. Scholl
andrew williams
C. S. D. Witt
Tianyu Zhang
Yang Zhang
RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software
Cheng-Hao Liu
Maksym Korablyov
Stanisław Jastrzębski
Paweł Włodarczyk-Pruszyński
Marwin Segler
E VALUATING G ENERALIZATION IN GF LOW N ETS FOR M OLECULE D ESIGN
Andrei Cristian Nica
Moksh J. Jain
Cheng-Hao Liu
Maksym Korablyov
Michael M. Bronstein
Deep learning bears promise for drug discovery problems such as de novo molecular design. Generating data to train such models is a costly a… (see more)nd time-consuming process, given the need for wet-lab experiments or expensive simulations. This problem is compounded by the notorious data-hungriness of machine learning algorithms. In small molecule generation the recently proposed GFlowNet method has shown good performance in generating diverse high-scoring candidates, and has the interesting advantage of being an off-policy offline method. Finding an appropriate generalization evaluation metric for such models, one predictive of the desired search performance (i.e. finding high-scoring diverse candidates), will help guide online data collection for such an algorithm. In this work, we develop techniques for evaluating GFlowNet performance on a test set, and identify the most promising metric for predicting generalization. We present empirical results on several small-molecule design tasks in drug discovery, for several GFlowNet training setups, and we find a metric strongly correlated with diverse high-scoring batch generation. This metric should be used to identify the best generative model from which to sample batches of molecules to be evaluated.
Inductive Biases for Relational Tasks
Current deep learning approaches have shown good in-distribution performance but struggle in out-of-distribution settings. This is especiall… (see more)y true in the case of tasks involving abstract relations like recognizing rules in sequences, as required in many intelligence tests. In contrast, our brains are remarkably flexible at such tasks, an attribute that is likely linked to anatomical constraints on computations. Inspired by this, recent work has explored how enforcing that relational representations remain distinct from sensory representations can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by ``partitioned'' representations of relations and sensory details. We investigate inductive biases that ensure abstract relations are learned and represented distinctly from sensory data across several neural network architectures and show that they outperform existing architectures on out-of-distribution generalization for various relational tasks. These results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing relational computations.
Object-centric Compositional Imagination for Visual Abstract Reasoning
Rim Assouel
Pau Rodriguez
Perouz Taslakian
David Vazquez
Like humans devoid of imagination, current machine learning systems lack the ability to adapt to new, unexpected situations by foreseeing th… (see more)em, which makes them unable to solve new tasks by analogical reasoning. In this work, we introduce a new compositional imagination framework that improves a model's ability to generalize. One of the key components of our framework is object-centric inductive biases that enables models to perceive the environment as a series of objects, properties, and transformations. By composing these key ingredients, it is possible to generate new unseen tasks that, when used to train the model, improve generalization. Experiments on a simplified version of the Abstraction and Reasoning Corpus (ARC) demonstrate the effectiveness of our framework.
A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions
Francois St-Hilaire
Dung D. Vu
Antoine Frau
Nathan J. Burns
Farid Faraji
Joseph Potochny
Stephane Robert
Arnaud Roussel
Selene Zheng
Taylor Glazier
Junfel Vincent Romano
Robert Belfer
Muhammad Shayan
Ariella Smofsky
Tommy Delarosbil
Seulmin Ahn
Simon Eden-Walker
Kritika Sony
Ansona Onyi Ching
Sabina Elkins … (see 11 more)
A. Stepanyan
Adela Matajova
Victor Chen
Hossein Sahraei
Robert Larson
N. Markova
Andrew Barkett
Iulian V. Serban
Ekaterina Kochmar
CACHE (Critical Assessment of Computational Hit-finding Experiments): A public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding
Suzanne Ackloo
R. Al-Awar
Rommie Elizabeth Amaro
C. Arrowsmith
Hatylas F. Z. Azevedo
R. Batey
U. Betz
Cristian G. Bologa
J. Chodera
Wendy Cornell
Ian Dunham
G. Ecker
Kristina Edfeldt
A. Edwards
M. Gilson
Cláudia Regina Gordijo
G. Hessler
Alexander Hillisch
Anders C Hogner … (see 19 more)
John Joseph Irwin
J. Jansen
Daniel Kuhn
Andrew R. Leach
Alpha A. Lee
Uta F. Lessel
J. Moult
Ingo Muegge
Tudor I. Oprea
Ben Perry
Patrick F. Riley
K. Saikatendu
Vijayaratnam Santhakumar
Matthieu Schapira
Cora Scholten
M. Todd
Masoud Vedadi
Andrea Volkamer
T. Willson
RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro
Paul Bertin
Jarrid Rector-Brooks
Deepak Sharma
Thomas Gaudelet
Andrew Anighoro
Torsten Gross
Francisco Martínez-Peña
Eileen L. Tang
S. SurajM
Cristian Regep
Jeremy B.R. Hayter
Maksym Korablyov
N. Valiante
Almer M. van der Sloot
Mike Tyers
Charles E.S. Roberts
Michael M. Bronstein
Luke Lee Lairson
Jake P. Taylor-King
Tackling Climate Change with Machine Learning
Priya L. Donti
Lynn H. Kaack
Kelly Kochanski
Alexandre Lacoste
Kris Sankaran
Andrew Slavin Ross
Nikola Milojevic-Dupont
Natasha Jaques
Anna Waldman-Brown
Alexandra Luccioni
Evan David Sherwin
S. Karthik Mukkavilli
Konrad Paul Kording
Carla P. Gomes
Andrew Y. Ng
Demis Hassabis
John C. Platt
Felix Creutzig … (see 2 more)
Jennifer T Chayes
Climate change is one of the greatest challenges facing humanity, and we, as machine learning (ML) experts, may wonder how we can help. Here… (see more) we describe how ML can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by ML, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the ML community to join the global effort against climate change.
ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods
Victor Schmidt
Alexandra Luccioni
Mélisande Teng
Tianyu Zhang
Alexia Reynaud
Sunand Raghupathi
Gautier Cosne
Adrien Juraver
Vahe Vardanyan
Alex Hernandez-Garcia
Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both poli… (see more)cy-making and individual behaviour. However, taking action requires understanding the effects of climate change, even though they may seem abstract and distant. Projecting the potential consequences of extreme climate events such as flooding in familiar places can help make the abstract impacts of climate change more concrete and encourage action. As part of a larger initiative to build a website that projects extreme climate events onto user-chosen photos, we present our solution to simulate photo-realistic floods on authentic images. To address this complex task in the absence of suitable training data, we propose ClimateGAN, a model that leverages both simulated and real data for unsupervised domain adaptation and conditional image generation. In this paper, we describe the details of our framework, thoroughly evaluate components of our architecture and demonstrate that our model is capable of robustly generating photo-realistic flooding.