Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Scientific Director, Leadership Team
Observer, Board of Directors, Mila
Research Topics
Causality
Computational Neuroscience
Deep Learning
Generative Models
Graph Neural Networks
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Natural Language Processing
Probabilistic Models
Reasoning
Recurrent Neural Networks
Reinforcement Learning
Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Research Intern - McGill University
Research Intern - Université de Montréal
PhD - Université de Montréal
Collaborating Alumni
Research Intern - Université du Québec à Rimouski
Independent visiting researcher
Co-supervisor :
PhD - Université de Montréal
Research Intern - UQAR
PhD - Université de Montréal
Independent visiting researcher - MIT
Collaborating researcher - N/A
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Collaborating researcher - Université Paris-Saclay
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Massachusetts Institute of Technology
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Research Intern - Barcelona University
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
Research Intern
Postdoctorate - Université de Montréal
Co-supervisor :
Independent visiting researcher - Technical University Munich (TUM)
PhD - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Co-supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Collaborating Alumni
Collaborating Alumni - Université de Montréal
Collaborating Alumni
PhD - Université de Montréal
Principal supervisor :
Research Intern - Imperial College London
PhD - Université de Montréal
Research Intern - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Collaborating Alumni
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)
Collaborating researcher - Ying Wu Coll of Computing
PhD - University of Waterloo
Principal supervisor :
PhD - Max-Planck-Institute for Intelligent Systems
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Independent visiting researcher - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Research Intern - Université de Montréal
Collaborating researcher
Principal supervisor :
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Research Intern - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Collaborating Alumni
Independent visiting researcher - Technical University of Munich
PhD - École Polytechnique Montréal Fédérale de Lausanne
Postdoctorate - Polytechnique Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - Valence
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Principal supervisor :
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating researcher - KAIST
Research Intern - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - McGill University
Principal supervisor :

Publications

SatBird: a Dataset for Bird Species Distribution Modeling using Remote Sensing and Citizen Science Data
Mélisande Teng
Amna Elmustafa
Benjamin Akera
Hager Radi
Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL
Chen Sun
Wannan Yang
Thomas Jiralerspong
Dane Malenfant
Benjamin Alsbury-Nealy
In real life, success is often contingent upon multiple critical steps that are distant in time from each other and from the final reward. T… (see more)hese critical steps are challenging to identify with traditional reinforcement learning (RL) methods that rely on the Bellman equation for credit assignment. Here, we present a new RL algorithm that uses offline contrastive learning to hone in on these critical steps. This algorithm, which we call Contrastive Retrospection (ConSpec), can be added to any existing RL algorithm. ConSpec learns a set of prototypes for the critical steps in a task by a novel contrastive loss and delivers an intrinsic reward when the current state matches one of the prototypes. The prototypes in ConSpec provide two key benefits for credit assignment: (i) They enable rapid identification of all the critical steps. (ii) They do so in a readily interpretable manner, enabling out-of-distribution generalization when sensory features are altered. Distinct from other contemporary RL approaches to credit assignment, ConSpec takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon (and ignoring other states) than it is to prospectively predict reward at every taken step. ConSpec greatly improves learning in a diverse set of RL tasks. The code is available at the link: https://github.com/sunchipsster1/ConSpec
DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets
Lazar Atanackovic
Alexander Tong
Jason Hartford
Leo J Lee
Bo Wang
Improving *day-ahead* Solar Irradiance Time Series Forecasting by Leveraging Spatio-Temporal Context
Oussama Boussif
Ghait Boukachab
Dan Assouline
Stefano Massaroli
Tianle Yuan
Loubna Benabbou
Solar power harbors immense potential in mitigating climate change by substantially reducing CO…
Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network
Tristan Deleu
Mizu Nishikawa-Toomey
Jithendaraa Subramanian
Nikolay Malkin
Generative Flow Networks (GFlowNets), a class of generative models over discrete and structured sample spaces, have been previously applied … (see more)to the problem of inferring the marginal posterior distribution over the directed acyclic graph (DAG) of a Bayesian Network, given a dataset of observations. Based on recent advances extending this framework to non-discrete sample spaces, we propose in this paper to approximate the joint posterior over not only the structure of a Bayesian Network, but also the parameters of its conditional probability distributions. We use a single GFlowNet whose sampling policy follows a two-phase process: the DAG is first generated sequentially one edge at a time, and then the corresponding parameters are picked once the full structure is known. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models of the Bayesian Network, making our approach applicable even to non-linear models parametrized by neural networks. We show that our method, called JSP-GFN, offers an accurate approximation of the joint posterior, while comparing favorably against existing methods on both simulated and real data.
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Stefano Massaroli
Michael Poli
Daniel Y Fu
Hermann Kumbong
Rom Nishijima Parnichkun
Aman Timalsina
David W. Romero
Quinn McIntyre
Beidi Chen
Atri Rudra
Ce Zhang
Christopher Re
Stefano Ermon
Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers… (see more). In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input sequence for each generated token -- similarly to attention-based models. In this paper, we seek to enable
Let the Flows Tell: Solving Graph Combinatorial Problems with GFlowNets
Dinghuai Zhang
Hanjun Dai
Nikolay Malkin
Ling Pan
Reusable Slotwise Mechanisms
Trang Nguyen
Amin Mansouri
Kanika Madan
Khuong N. Nguyen
Nguyen Duy Khuong
Kartik Ahuja
Dianbo Liu
Agents with the ability to comprehend and reason about the dynamics of objects would be expected to exhibit improved robustness and generali… (see more)zation in novel scenarios. However, achieving this capability necessitates not only an effective scene representation but also an understanding of the mechanisms governing interactions among object subsets. Recent studies have made significant progress in representing scenes using object slots. In this work, we introduce Reusable Slotwise Mechanisms, or RSM, a framework that models object dynamics by leveraging communication among slots along with a modular architecture capable of dynamically selecting reusable mechanisms for predicting the future states of each object slot. Crucially, RSM leverages the Central Contextual Information (CCI), enabling selected mechanisms to access the remaining slots through a bottleneck, effectively allowing for modeling of higher order and complex interactions that might require a sparse subset of objects. Experimental results demonstrate the superior performance of RSM compared to state-of-the-art methods across various future prediction and related downstream tasks, including Visual Question Answering and action planning. Furthermore, we showcase RSM's Out-of-Distribution generalization ability to handle scenes in intricate scenarios.
Neural Causal Structure Discovery from Interventions
Nan Rosemary Ke
Olexa Bilaniuk
Anirudh Goyal
Stefan Bauer
Bernhard Schölkopf
Michael Curtis Mozer
Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (see more) However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository.
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Patrick Mark Butlin
R. Long
Eric Elmoznino
Jonathan C. P. Birch
Axel Constant
George Deane
S. Fleming
C. Frith
Xuanxiu Ji
Ryota Kanai
C. Klein
Grace W. Lindsay
Matthias Michel
Liad Mudrik
Megan A. K. Peters
Eric Schwitzgebel
Jonathan Simon
Rufin Vanrullen
Scientific discovery in the age of artificial intelligence
Hanchen Wang
Tianfan Fu
Yuanqi Du
Wenhao Gao
Kexin Huang
Ziming Liu
Payal Chandak
Shengchao Liu
Peter Van Katwyk
Andreea Deac
Animashree Anandkumar
K. Bergen
Carla P. Gomes
Shirley Ho
Pushmeet Kohli
Joan Lasenby
Jure Leskovec
Tie-Yan Liu
A. Manrai
Debora Susan Marks … (see 10 more)
Bharath Ramsundar
Le Song
Jimeng Sun
Petar Veličković
Max Welling
Linfeng Zhang
Connor Wilson. Coley
Marinka Žitnik
What if We Enrich day-ahead Solar Irradiance Time Series Forecasting with Spatio-Temporal Context?
Oussama Boussif
Ghait Boukachab
Dan Assouline
Stefano Massaroli
Tianle Yuan
Loubna Benabbou