Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Stefan Bauer

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

David Rolnick

Xiaoyin Chen

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

Loubna Benabbou

Desmond Elliott

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernandez-Garcia

Tabitha Edith Lee

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Nassim Rahaman

Collaborating Alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Collaborating researcher - Université de Montréal

Co-supervisor :

Loubna Benabbou

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

James Requeima

Independent visiting researcher - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Dragos Secrieru

Collaborating Alumni - Université de Montréal

Divya Sharma

Postdoctorate

Co-supervisor :

Alex Hernandez-Garcia

Mélisande Astrid Crystal Teng

Vincent Taboga

Collaborating Alumni - Polytechnique Montréal

Co-supervisor :

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Collaborating researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher

Collaborating researcher - Université de Montréal

Dinghuai Zhang

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

Revisiting Fundamentals of Experience Replay

William Fedus

Prajit Ramachandran

Rishabh Agarwal

Hugo Larochelle

Mark Rowland

Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understa… (see more)nding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (published)

proceedings.mlr.press

Gradient Starvation: A Learning Proclivity in Neural Networks

Sékou-Oumar Kaba

We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks… (see more). Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and real-world out-of-distribution (OOD) generalization experiments.

2020-11-18

ArXiv (preprint)

DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

Timo Milbich

Samarth Sinha

Björn Ommer

2020-11-07

Computer Vision – ECCV 2020 (published)

Experience Grounds Language

Yonatan Bisk

Ari Holtzman

Jesse D. Thomason

Jacob Andreas

Joyce Yue Chai

Mirella Lapata

Angeliki Lazaridou

Jonathan May

Aleksandr Nisnevich

Nicolas Pinto

Joseph Turian

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Iulian V. Serban

Chinnadhurai Sankar

Michael Pieper

Joelle Pineau

Deep reinforcement learning has recently shown many impressive successes. However, one major obstacle towards applying such methods to real-… (see more)world problems is their lack of data-efficiency. To this end, we propose the Bottleneck Simulator: a model-based reinforcement learning method which combines a learned, factorized transition model of the environment with rollout simulations to learn an effective policy from few examples. The learned transition model employs an abstract, discrete (bottleneck) state, which increases sample efficiency by reducing the number of model parameters and by exploiting structural properties of the environment. We provide a mathematical analysis of the Bottleneck Simulator in terms of fixed points of the learned policy, which reveals how performance is affected by four distinct sources of error: an error related to the abstract space structure, an error related to the transition model estimation variance, an error related to the transition model estimation bias, and an error related to the transition model class bias. Finally, we evaluate the Bottleneck Simulator on two natural language processing tasks: a text adventure game and a real-world, complex dialogue response selection task. On both tasks, the Bottleneck Simulator yields excellent performance beating competing approaches.

2020-10-27

Journal of Artificial Intelligence Research (published)

NU-GAN: High resolution neural upsampling with GAN

In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling). Audio upsampling is an… (see more) important problem since productionizing generative speech technology requires operating at high sampling rates. Such applications use audio at a resolution of 44.1 kHz or 48 kHz, whereas current speech synthesis methods are equipped to handle a maximum of 24 kHz resolution. NU-GAN takes a leap towards solving audio upsampling as a separate component in the text-to-speech (TTS) pipeline by leveraging techniques for audio generation using GANs. ABX preference tests indicate that our NU-GAN resampler is capable of resampling 22 kHz to 44.1 kHz audio that is distinguishable from original audio only 7.4% higher than random chance for single speaker dataset, and 10.8% higher than chance for multi-speaker dataset.

2020-10-22

ArXiv (preprint)

Cross-Modal Information Maximization for Medical Imaging: CMIM

Tristan Sylvain

Francis Dutil

Tess Berthier

Lisa Di Jorio

Margaux Luck

(Rex) Devon Hjelm

2020-10-20

ArXiv (preprint)

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

Alex Lamb

Anirudh Goyal

A. Slowik

Michael Curtis Mozer

Philippe Beaudoin

Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (see more)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.

2020-10-15

ArXiv (preprint)

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

Juho Kannala

We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to tra… (see more)in a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural network, without making any assumptions about the "aggregation" layer or the depth of the graph neural networks. We experimentally validate this analysis by applying GraphMix to various architectures such as Graph Convolutional Networks, Graph Attention Networks and Graph-U-Net. Despite its simplicity, we demonstrate that GraphMix can consistently improve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets: Cora-Full, Co-author-CS and Co-author-Physics.

2020-10-11

AAAI Conference on Artificial Intelligence (published)

COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing

Prateek Gupta

Tegan Maharaj

Martin Weiss

Nasim Rahaman

Hannah Alsdurf

Abhinav Sharma

Nanor Minoyan

Soren Harnois-Leblanc

Victor Schmidt

Pierre-Luc St-Charles

Tristan Deleu

Andrew Robert Williams

Akshay Patel

gaetan caron

satya ortiz gagne

David Buckeridge … (see 9 more)

Joumana Ghosn

Yang Zhang

Bernhard Schölkopf

Joanna Merckx

2020-10-02

OpenReview.net/Anonymous_Preprint (unknown)

openreview.net

Generating Multiscale Amorphous Molecular Structures Using Deep Learning: A Study in 2D.

Michael Kilgour

Nicolas Gastellu

David Y. T. Hui

Lena Simine

Amorphous molecular assemblies appear in a vast array of systems: from living cells to chemical plants and from everyday items to new device… (see more)s. The absence of long-range order in amorphous materials implies that precise knowledge of their underlying structures throughout is needed to rationalize and control their properties at the mesoscale. Standard computational simulations suffer from exponentially unfavorable scaling of the required compute with system size. We present a method based on deep learning that leverages the finite range of structural correlations for an autoregressive generation of disordered molecular aggregates up to arbitrary size from small-scale computational or experimental samples. We benchmark performance on self-assembled nanoparticle aggregates and proceed to simulate monolayer amorphous carbon with atomistic resolution. This method bridges the gap between the nanoscale and mesoscale simulations of amorphous molecular systems.

2020-09-24

Journal of Physical Chemistry Letters (published)

A learning-based algorithm to quickly compute good primal solutions for Stochastic Integer Programs

Emma Frejinger

Andrea Lodi

Rahul Anuj Patel

Sriram Sankaranarayanan

2020-09-19

Integration of Constraint Programming, Artificial Intelligence, and Operations Research (published)