Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Scientific Director, Leadership Team
Observer, Board of Directors, Mila

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Research Intern - Université de Montréal
PhD - Université de Montréal
Research Intern - Université du Québec à Rimouski
Professional Master's - Université de Montréal
Independent visiting researcher
Co-supervisor :
Independent visiting researcher - UQAR
PhD - Université de Montréal
Independent visiting researcher - MIT
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating researcher - Université Paris-Saclay
Principal supervisor :
PhD - Université de Montréal
PhD - Massachusetts Institute of Technology
PhD - Université de Montréal
PhD - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Collaborating researcher
Postdoctorate - Université de Montréal
Co-supervisor :
Independent visiting researcher - Technical University Munich (TUM)
PhD - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Co-supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Collaborating Alumni
Research Intern - Université de Montréal
Professional Master's - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Research Intern - McGill University
Research Intern - Imperial College London
PhD - Université de Montréal
Research Intern - Université de Montréal
Collaborating Alumni - Université de Montréal
DESS - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Professional Master's - Université de Montréal
Independent visiting researcher - Université de Montréal
Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)
Collaborating researcher - Ying Wu Coll of Computing
Professional Master's - Université de Montréal
Undergraduate - Université de Montréal
PhD - Max-Planck-Institute for Intelligent Systems
Professional Master's - Université de Montréal
Independent visiting researcher - Université de Montréal
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Professional Master's - Université de Montréal
Independent visiting researcher - Technical University of Munich
PhD - École Polytechnique Montréal Fédérale de Lausanne
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - Valence
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Principal supervisor :
PhD - Université de Montréal
Professional Master's - Université de Montréal
Collaborating Alumni - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :

Publications

FAENet: Frame Averaging Equivariant GNN for Materials Modeling
Alexandre AGM Duval
Victor Schmidt
Alex Hernandez-Garcia
Santiago Miret
Fragkiskos D. Malliaros
Applications of machine learning techniques for materials modeling typically involve functions known to be equivariant or invariant to speci… (see more)fic symmetries. While graph neural networks (GNNs) have proven successful in such tasks, they enforce symmetries via the model architecture, which often reduces their expressivity, scalability and comprehensibility. In this paper, we introduce (1) a flexible framework relying on stochastic frame-averaging (SFA) to make any model E(3)-equivariant or invariant through data transformations. (2) FAENet: a simple, fast and expressive GNN, optimized for SFA, that processes geometric information without any symmetrypreserving design constraints. We prove the validity of our method theoretically and empirically demonstrate its superior accuracy and computational scalability in materials modeling on the OC20 dataset (S2EF, IS2RE) as well as common molecular modeling tasks (QM9, QM7-X). A package implementation is available at https://faenet.readthedocs.io.
GFlowNet-EM for Learning Compositional Latent Variable Models
Edward J Hu
Nikolay Malkin
Moksh J. Jain
Katie E Everett
Alexandros Graikos
Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large nu… (see more)mber of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step. By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational inference algorithms for complex distributions over discrete structures. Our approach, GFlowNet-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-context-free grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder.
Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization
Dianbo Liu
Alex Lamb
Xu Ji
Pascal Notsawo
Michael Curtis Mozer
Kenji Kawaguchi
The Effect of diversity in Meta-Learning
Ramnath Kumar
Tristan Deleu
Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that task … (see more)distribution plays a vital role in the performance of the model. Conventional wisdom is that task diversity should improve the performance of meta-learning. In this work, we find evidence to the contrary; we study different task distributions on a myriad of models and datasets to evaluate the effect of task diversity on meta-learning algorithms. For this experiment, we train on multiple datasets, and with three broad classes of meta-learning models - Metric-based (i.e., Protonet, Matching Networks), Optimization-based (i.e., MAML, Reptile, and MetaOptNet), and Bayesian meta-learning models (i.e., CNAPs). Our experiments demonstrate that the effect of task diversity on all these algorithms follows a similar trend, and task diversity does not seem to offer any benefits to the learning of the model. Furthermore, we also demonstrate that even a handful of tasks, repeated over multiple batches, would be sufficient to achieve a performance similar to uniform sampling and draws into question the need for additional tasks to create better models.
Constant Memory Attention Block
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
BatchGFN: Generative Flow Networks for Batch Active Learning
Shreshth A Malik
Salem Lahlou
Andrew Jesson
Moksh J. Jain
Nikolay Malkin
Tristan Deleu
Yarin Gal
We introduce BatchGFN—a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points pro… (see more)portional to a batch reward. With an appropriate reward function to quantify the utility of acquiring a batch, such as the joint mutual information between the batch and the model parameters, BatchGFN is able to construct highly informative batches for active learning in a principled way. We show our approach enables sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems. This alleviates the computational complexity of batch-aware algorithms and removes the need for greedy approximations to find maximizers for the batch reward. We also present early results for amortizing training across acquisition steps, which will enable scaling to real-world tasks.
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation
Chris Emezue
Tristan Deleu
Stefan Bauer
GFlowNets for Causal Discovery: an Overview
Dragos Cristian Manta
Edward J Hu
Simulation-Free Schrödinger Bridges via Score and Flow Matching
Alexander Tong
Nikolay Malkin
Kilian FATRAS
Lazar Atanackovic
Yanlei Zhang
Guillaume Huguet
We present simulation-free score and flow matching ([SF]…
Thompson Sampling for Improved Exploration in GFlowNets
Jarrid Rector-Brooks
Kanika Madan
Moksh J. Jain
Maksym Korablyov
Cheng-Hao Liu
Nikolay Malkin
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over composition… (see more)al objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior
Ayush K Chakravarthy
Trang M. Nguyen
Anirudh Goyal
Michael Curtis Mozer
Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets
Dinghuai Zhang
Hanjun Dai
Nikolay Malkin
Ling Pan
Combinatorial optimization (CO) problems are often NP-hard and thus out of reach for exact algorithms, making them a tempting domain to appl… (see more)y machine learning methods. The highly structured constraints in these problems can hinder either optimization or sampling directly in the solution space. On the other hand, GFlowNets have recently emerged as a powerful machinery to efficiently sample from composite unnormalized densities sequentially and have the potential to amortize such solution-searching processes in CO, as well as generate diverse solution candidates. In this paper, we design Markov decision processes (MDPs) for different combinatorial problems and propose to train conditional GFlowNets to sample from the solution space. Efficient training techniques are also developed to benefit long-range credit assignment. Through extensive experiments on a variety of different CO tasks with synthetic and realistic data, we demonstrate that GFlowNet policies can efficiently find high-quality solutions. Our implementation is open-sourced at https://github.com/zdhNarsil/GFlowNet-CombOpt.