Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Founder and Scientific Advisor, Leadership Team
Research Topics
Causality
Computational Neuroscience
Deep Learning
Generative Models
Graph Neural Networks
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Natural Language Processing
Probabilistic Models
Reasoning
Recurrent Neural Networks
Reinforcement Learning
Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Marie-Josée Beauchamp, Administrative Assistant at marie-josee.beauchamp@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Collaborating Alumni - McGill University
Collaborating Alumni - Université de Montréal
Collaborating researcher - Cambridge University
Principal supervisor :
PhD - Université de Montréal
Independent visiting researcher
Co-supervisor :
PhD - Université de Montréal
Independent visiting researcher
Principal supervisor :
Collaborating researcher - N/A
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher - KAIST
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
Co-supervisor :
Independent visiting researcher
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Principal supervisor :
Collaborating Alumni
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Ying Wu Coll of Computing
Collaborating researcher - University of Waterloo
Principal supervisor :
Collaborating Alumni - Max-Planck-Institute for Intelligent Systems
Collaborating researcher - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Independent visiting researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating Alumni - Université de Montréal
Postdoctorate
Co-supervisor :
Independent visiting researcher - Technical University of Munich
PhD - Université de Montréal
Co-supervisor :
Independent visiting researcher
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Collaborating researcher - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Collaborating Alumni - McGill University
Principal supervisor :

Publications

Learning Decision Trees as Amortized Structure Inference
Learning Decision Trees as Amortized Structure Inference
Building predictive models for tabular data presents fundamental challenges, notably in scaling consistently, i.e., more resources translati… (see more)ng to better performance, and generalizing systematically beyond the training data distribution. Designing decision tree models remains especially challenging given the intractably large search space, and most existing methods rely on greedy heuristics, while deep learning inductive biases expect a temporal or spatial structure not naturally present in tabular data. We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data, formulating decision tree construction as a sequential planning problem. We train a deep reinforcement learning (GFlowNet) policy to solve this problem, yielding a generative model that samples decision trees from the Bayesian posterior. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks derived from real-world data, robustness to distribution shifts, and anomaly detection, all while yielding interpretable models with shorter description lengths. Samples from the trained DT-GFN model can be ensembled to construct a random forest, and we further show that the performance of scales consistently in ensemble size, yielding ensembles of predictors that continue to generalize systematically.
Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles
Alexander Rubinstein
Damien Teney
Seong Joon Oh
Armand Mihai Nicolicioiu
Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as shortcut lea… (see more)rning, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose
Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control
Berton Earnshaw
Jason Hartford
Diffusion Probabilistic Models (DPMs) are powerful generative models that have achieved unparalleled success in a number of generative tasks… (see more). In this work, we aim to build inductive biases into the training and sampling of diffusion models to better accommodate the target distribution of the data to model. For topologically structured data, we devise a frequency-based noising operator to purposefully manipulate, and set, these inductive biases. We first show that appropriate manipulations of the noising forward process can lead DPMs to focus on particular aspects of the distribution to learn. We show that different datasets necessitate different inductive biases, and that appropriate frequency-based noise control induces increased generative performance compared to standard diffusion. Finally, we demonstrate the possibility of ignoring information at particular frequencies while learning. We show this in an image corruption and recovery task, where we train a DPM to recover the original target distribution after severe noise corruption.
Solving Bayesian inverse problems with diffusion priors and off-policy RL
This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (see more)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Juan A. Rodriguez
Chao Wang
Akshay Kalkunte Suresh
Xiangru Jian
Pierre-Andre Noel
Sathwik Tejaswi Madhusudhan
Enamul Hoque
Issam Hadj Laradji
David Vazquez
Sai Rajeswar
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges… (see more) on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), often produce out-of-distribution or noisy inputs, leading to misalignment between the modalities. In this work, we propose a novel vision-text alignment method, AlignVLM, that maps visual features to a weighted average of LLM text embeddings. Our approach leverages the linguistic priors encoded by the LLM to ensure that visual features are mapped to regions of the space that the LLM can effectively interpret. AlignVLM is particularly effective for document understanding tasks, where scanned document images must be accurately mapped to their textual content. Our extensive experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods. We provide further analysis demonstrating improved vision-text feature alignment and robustness to noise.
Learning Decision Trees as Amortized Structure Inference
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision
Diego Velazquez
Pau Rodriguez
Sergio Alonso
Josep M. Gonfaus
Jordi Gonzalez
Gerardo Richarte
Javier Marin
Alexandre Lacoste
This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhanc… (see more)e deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.
A physics-based data-driven model for CO$_2$ gas diffusion electrodes to drive automated laboratories
Abhishek Soni
Karry Ocean
Kevan Dettelbach
Ribwar Ahmadi
Mehrdad Mokhtari
Curtis P. Berlinguette
The electrochemical reduction of atmospheric CO…
OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes
Rhiannon Hendley
Sun Sun
Alain Tchagang
Jiang Su
Hongyu Guo
Homin Shin
Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher… (see more) theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theoretical calculations and experimental validation is both time-consuming and resource-intensive. While machine learning holds the promise to expedite this process, relevant ionic conductivity and structural data is scarce. Here, we present OBELiX, a domain-expert-curated database of
OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes
Rhiannon Hendley
Sun Sun
Alain Tchagang
Jiang Su
Hongyu Guo
Homin Shin
Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher… (see more) theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theoretical calculations and experimental validation is both time-consuming and resource-intensive. While machine learning holds the promise to expedite this process, relevant ionic conductivity and structural data is scarce. Here, we present OBELiX, a database of
OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes
F'elix Therrien
Rhiannon Hendley
Alex Hern'andez-Garc'ia
Sun Sun
Alain Tchagang
Jiang Su
Hongyu Guo
Homin Shin
Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher… (see more) theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theoretical calculations and experimental validation is both time-consuming and resource-intensive. While machine learning holds the promise to expedite this process, relevant ionic conductivity and structural data is scarce. Here, we present OBELiX, a domain-expert-curated database of