Yoshua Bengio

ahmad.ghawanmeh@mila.quebec

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Aayush Bajaj

Professional Master's - Université de Montréal

Co-supervisor :

Samira Ebrahimi Kahou

aayush.bajaj@mila.quebec

Ahmad Ghawanmeh

Professional Master's - Université de Montréal

Akram Erraqabi

PhD - Université de Montréal

akram.erraqabi@mila.quebec

Alex Hernandez-Garcia

Postdoctorate - Université de Montréal

Co-supervisor :

Postdoctorate - Université de Montréal

alexander.tong@mila.quebec

Sasha Volokhova

PhD - Université de Montréal

alexandra.volokhova@mila.quebec

Alexandre Duval

Collaborating researcher - Université Paris-Saclay

Principal supervisor :

alexandre.duval@mila.quebec

andres.campero@mila.quebec

Aman Dalmia

Professional Master's - Université de Montréal

aman.dalmia@mila.quebec

Andrés Campero

Independent visiting researcher - MIT

aniket.didolkar@mila.quebec

Aniket Didolkar

PhD - Université de Montréal

ayoub.atanane@mila.quebec

Anja Surina

PhD - École Polytechnique Montréal Fédérale de Lausanne

anja.surina@mila.quebec

Ayoub Atanane

Research Intern - Université du Québec à Rimouski

Basile Terver

Collaborating researcher

Principal supervisor :

basile.terver@mila.quebec

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

rochefoc@mila.quebec

clemence.granade@mila.quebec

Chen Chen

Postdoctorate - Université de Montréal

Co-supervisor :

Collaborating Alumni

Professional Master's - Université de Montréal

Cristian Meo

Collaborating Alumni

cristian.meo@mila.quebec

cristian-dragos.manta@mila.quebec

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

damiano.fornasiere@mila.quebec

Damiano Fornasiere

PhD - Barcelona University

Dan Assouline

Collaborating Alumni

dan.assouline@mila.quebec

dinghuai.zhang@mila.quebec

Dinghuai Zhang

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Divya Sharma

Collaborating Alumni

divya.sharma@mila.quebec

Donna Vakalis

Postdoctorate - Université de Montréal

Co-supervisor :

donna.vakalis@mila.quebec

dragos.secrieru@mila.quebec

Dragos Secrieru

Master's Research - Université de Montréal

Edward Hu

PhD - Université de Montréal

edward.hu@mila.quebec

Elmimouni Zakaria

Research Intern - Université de Montréal

zakarya.elmimouni@mila.quebec

Eric Elmoznino

PhD - Université de Montréal

Co-supervisor :

eric.elmoznino@mila.quebec

Research Intern - UQAR

ghait.boukachab@mila.quebec

jack.richter-powell@mila.quebec

Hae-Beom Lee

Collaborating Alumni

hae-beom.lee@mila.quebec

Jessie Richter-Powell

Independent visiting researcher - Université de Montréal

hussein-mohamu.jama@mila.quebec

Jama Mohamud

PhD - Université de Montréal

Principal supervisor :

Mirco Ravanelli

Research Intern - McGill University

jamal.abouhaibeh@mila.quebec

james.requeima@mila.quebec

James Requeima

Independent visiting researcher - Université de Montréal

Jarrid Rector-Brooks

PhD - Université de Montréal

Co-supervisor :

Sarath Chandar Anbil Parthipan

jarrid.rector-brooks@mila.quebec

Jean-pierre Falet

PhD - Université de Montréal

Co-supervisor :

jean-pierre.falet@mila.quebec

Professional Master's - Université de Montréal

jerome.francis@mila.quebec

katie-elizabeth.everett@mila.quebec

George Jiangyan Ma

Research Intern - Université de Montréal

jiangyan.ma@mila.quebec

PhD - Université de Montréal

madankan@mila.quebec

Katie Everett

PhD - Massachusetts Institute of Technology

Léna Ezzine

PhD - Université de Montréal

lena-nehale.ezzine@mila.quebec

Leo Feng

PhD - Université de Montréal

leo.feng@mila.quebec

Leon Hetzel

Independent visiting researcher - Technical University Munich (TUM)

leon.hetzel@mila.quebec

Ling Pan

Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)

ling.pan@mila.quebec

loubna.benabbou@mila.quebec

Loic Mandine

DESS - Université de Montréal

loic.mandine@mila.quebec

Loubna Benabbou

Independent visiting researcher - UQAR

marcin.sendera@mila.quebec

Luca Scimeca

Postdoctorate - Université de Montréal

luca.scimeca@mila.quebec

PhD - Université de Montréal

korablym@mila.quebec

Marcin Sendera

Research Intern - Université de Montréal

Marco STOCK

Independent visiting researcher - Technical University of Munich

marco.stock@mila.quebec

matthew.macdermott@mila.quebec

Matt MacDermott

Research Intern - Imperial College London

Mélisande Astrid Crystal Teng

PhD - Université de Montréal

Co-supervisor :

Postdoctorate - Université de Montréal

michal.koziarski@mila.quebec

Harry Zhao

PhD - McGill University

Principal supervisor :

Mingze Li

Professional Master's - Université de Montréal

mingze2.li@mila.quebec

Minsu Kim

Collaborating researcher - Université de Montréal

minsu.kim@mila.quebec

Research Intern - Université de Montréal

mohammed.mahfoud@mila.quebec

Mohammed Abukalam

Research Intern - Université de Montréal

mohammed.abukalam@mila.quebec

Mohsin Hasan

PhD - Université de Montréal

mohsin.hasan@mila.quebec

nikolay.malkin@mila.quebec

Moksh Jain

PhD - Université de Montréal

moksh.jain@mila.quebec

PhD - Max-Planck-Institute for Intelligent Systems

rahamann@mila.quebec

Nicole Zhang

PhD - McGill University

Principal supervisor :

Collaborating Alumni - Université de Montréal

PhD - Université de Montréal

oussama.boussif@mila.quebec

pierre-paul.de-breuck@mila.quebec

Param Raval

Professional Master's - Université de Montréal

param.raval@mila.quebec

Paul Bertin

PhD - Université de Montréal

bertinpa@mila.quebec

Phong Nguyen

Independent visiting researcher - Université de Montréal

nguyenph@mila.quebec

Pierre-Paul De Breuck

Collaborating Alumni - Université de Montréal

Collaborating researcher

pietro.greiner@mila.quebec

Priya Nama Venkatesh

Professional Master's - Université de Montréal

priya.nama@mila.quebec

Prudencio Tossou

Collaborating researcher - Valence

Principal supervisor :

Dominique Beaini

prudencio.tossou@mila.quebec

Rim Assouel

PhD - Université de Montréal

assouelr@mila.quebec

Ruixiang Zhang

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

lahlosal@mila.quebec

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

Seanie Lee

Research Intern - Université de Montréal

seanie.lee@mila.quebec

Professional Master's

aasheesh.singh@mila.quebec

Collaborating researcher - Université de Montréal

soren.mindermann@mila.quebec

Stefan Bauer

Independent visiting researcher

Co-supervisor :

stefan.bauer@mila.quebec

stefano.massaroli@mila.quebec

Stefano Massaroli

Postdoctorate - Université de Montréal

Stephen Lu

Research Intern - McGill University

stephen.lu@mila.quebec

Professional Master's - Université de Montréal

subhrajyoti.dasgupta@mila.quebec

Theo Saulus

Collaborating researcher

Principal supervisor :

thomas.jiralerspong@mila.quebec

theo.saulus@mila.quebec

Thomas Jiralerspong

Master's Research - Université de Montréal

Co-supervisor :

Doina Precup

PhD - Université de Montréal

tianyu.zhang@mila.quebec

PhD - Université de Montréal

Vedant Shah

Master's Research - Université de Montréal

vedant.shah@mila.quebec

PhD - Université de Montréal

Todosijevic Viktor Todosijevic

Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)

Principal supervisor :

viktor.todosijevic@mila.quebec

vincent.quirion@mila.quebec

Vincent Quirion

Undergraduate - Université de Montréal

Xiaoyin Chen

PhD - Université de Montréal

xiaoyin.chen@mila.quebec

Yashaswi Pupneja

Professional Master's - Université de Montréal

yashaswi.pupneja@mila.quebec

younesse.kaddar@mila.quebec

Yizhao Wang

Professional Master's - Université de Montréal

yizhao.wang@mila.quebec

Younesse Kaddar

Research Intern - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Zhen Liu

PhD - Université de Montréal

Principal supervisor :

Liam Paull

liuzhen@mila.quebec

Zibo Shang

Professional Master's - Université de Montréal

zibo.shang@mila.quebec

Zichao Yan

Postdoctorate - Université de Montréal

yanzicha@mila.quebec

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

VIM: Variational Independent Modules for Video Prediction

Rim Assouel

Lluis Castrejon

Aaron Courville

Nicolas Ballas

We introduce a variational inference model called VIM, for Variational Independent Modules, for sequential data that learns and infers laten… (see more)t representations as a set of objects and discovers modular causal mechanisms over these objects. These mechanisms - which we call modules - are independently parametrized, define the stochastic transitions of entities and are shared across entities. At each time step, our model infers from a low-level input sequence a high-level sequence of categorical latent variables to select which transition modules to apply to which high-level object. We evaluate this model in video prediction tasks where the goal is to predict multi-modal future events given previous observations. We demonstrate empirically that VIM can model 2D visual sequences in an interpretable way and is able to identify the underlying dynamically instantiated mechanisms of the generation process. We additionally show that the learnt modules can be composed at test time to generalize to out-of-distribution observations.

2022-06-28

Proceedings of the First Conference on Causal Learning and Reasoning (published)

proceedings.mlr.press

On Neural Architecture Inductive Biases for Relational Tasks

Giancarlo Kerg

Sarthak Mittal

Current deep learning approaches have shown good in-distribution generalization performance, but struggle with out-of-distribution generaliz… (see more)ation. This is especially true in the case of tasks involving abstract relations like recognizing rules in sequences, as we find in many intelligence tests. Recent work has explored how forcing relational representations to remain distinct from sensory representations, as it seems to be the case in the brain, can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by 'partitioned' representations of relations and sensory details, and how this inductive bias can help recompose learned relational structure in newly encountered settings. We introduce a simple architecture based on similarity scores which we name Compositional Relational Network (CoRelNet). Using this model, we investigate a series of inductive biases that ensure abstract relations are learned and represented distinctly from sensory data, and explore their effects on out-of-distribution generalization for a series of relational psychophysics tasks. We find that simple architectural choices can outperform existing models in out-of-distribution generalization. Together, these results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing out-of-distribution relational computations.

2022-06-09

ArXiv (preprint)

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

Akram Erraqabi

Marlos C. Machado

Harry Zhao

Mingde Zhao

Sainbayar Sukhbaatar

Alessandro Lazaric

Ludovic Denoyer

In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from… (see more) skill discovery to reward shaping. Recently, learning the Laplacian representation has been framed as the optimization of a temporally-contrastive objective to overcome its computational limitations in large (or continuous) state spaces. However, this approach requires uniform access to all states in the state space, overlooking the exploration problem that emerges during the representation learning process. In this work, we propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation. We do so by combining the representation learning with a skill-based covering policy, which provides a better training distribution to extend and refine the representation. We also show that a simple augmentation of the representation objective with the learned temporal abstractions improves dynamics-awareness and helps exploration. We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments. Finally, even if our method is not optimized for skill discovery, the learned skills can successfully solve difficult continuous navigation tasks with sparse rewards, where standard skill discovery approaches are no so effective.

2022-05-20

auai.org/UAI/2022/Conference (poster)

E VALUATING G ENERALIZATION IN GF LOW N ETS FOR M OLECULE D ESIGN

Andrei Cristian Nica

Moksh J. Jain

Emmanuel Bengio

Cheng-Hao Liu

Maksym Korablyov

Michael M. Bronstein

Deep learning bears promise for drug discovery problems such as de novo molecular design. Generating data to train such models is a costly a… (see more)nd time-consuming process, given the need for wet-lab experiments or expensive simulations. This problem is compounded by the notorious data-hungriness of machine learning algorithms. In small molecule generation the recently proposed GFlowNet method has shown good performance in generating diverse high-scoring candidates, and has the interesting advantage of being an off-policy offline method. Finding an appropriate generalization evaluation metric for such models, one predictive of the desired search performance (i.e. finding high-scoring diverse candidates), will help guide online data collection for such an algorithm. In this work, we develop techniques for evaluating GFlowNet performance on a test set, and identify the most promising metric for predicting generalization. We present empirical results on several small-molecule design tasks in drug discovery, for several GFlowNet training setups, and we find a metric strongly correlated with diverse high-scoring batch generation. This metric should be used to identify the best generative model from which to sample batches of molecules to be evaluated.

2022-04-05

ICLR.cc/2022/Workshop/MLDD (poster)

Inductive Biases for Relational Tasks

Giancarlo Kerg

Sarthak Mittal

Current deep learning approaches have shown good in-distribution performance but struggle in out-of-distribution settings. This is especiall… (see more)y true in the case of tasks involving abstract relations like recognizing rules in sequences, as required in many intelligence tests. In contrast, our brains are remarkably flexible at such tasks, an attribute that is likely linked to anatomical constraints on computations. Inspired by this, recent work has explored how enforcing that relational representations remain distinct from sensory representations can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by ``partitioned'' representations of relations and sensory details. We investigate inductive biases that ensure abstract relations are learned and represented distinctly from sensory data across several neural network architectures and show that they outperform existing architectures on out-of-distribution generalization for various relational tasks. These results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing relational computations.

2022-03-25

ICLR.cc/2022/Workshop/OSC (poster)

A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions

Francois St-Hilaire

Dung D. Vu

Antoine Frau

Nathan J. Burns

Farid Faraji

Joseph Potochny

Stephane Robert

Arnaud Roussel

Selene Zheng

Taylor Glazier

Junfel Vincent Romano

Robert Belfer

Muhammad Shayan

Ariella Smofsky

Tommy Delarosbil

Seulmin Ahn

Simon Eden-Walker

Kritika Sony

Ansona Onyi Ching

Sabina Elkins … (see 11 more)

A. Stepanyan

Adela Matajova

Victor Chen

Hossein Sahraei

Robert Larson

N. Markova

Andrew Barkett

Laurent Charlin

Iulian V. Serban

Ekaterina Kochmar

2022-03-03

ArXiv (preprint)

Tackling Climate Change with Machine Learning

Priya L. Donti

Lynn H. Kaack

Kelly Kochanski

Alexandre Lacoste

Kris Sankaran

Andrew Slavin Ross

Nikola Milojevic-Dupont

Natasha Jaques

Anna Waldman-Brown

Alexandra Luccioni

Tegan Maharaj

Evan David Sherwin

S. Karthik Mukkavilli

Konrad Paul Kording

Carla P. Gomes

Andrew Y. Ng

Demis Hassabis

John C. Platt

Felix Creutzig … (see 2 more)

Jennifer T Chayes

Climate change is one of the greatest challenges facing humanity, and we, as machine learning (ML) experts, may wonder how we can help. Here… (see more) we describe how ML can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by ML, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the ML community to join the global effort against climate change.

2022-02-07

ACM Computing Surveys (published)

Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal

Sharath Chandra Raparthy

Irina Rish

Multi-head, key-value attention is the backbone of transformer-like model architectures which have proven to be widely successful in recent … (see more)years. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interaction, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval and is easy to implement in a variety of established network architectures.

2022-01-01

International Conference on Learning Representations (published)

Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

Eric Larsen

Sébastien Lachapelle

This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a method… (see more)ology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming, where the second stage is demanding computationally. We aim to predict at a high speed the expected TDOS associated with the second-stage problem, conditionally on the first-stage variables. This may be used in support of the solution to the overall two-stage problem by avoiding the online generation of multiple second-stage scenarios and solutions. We formulate the tactical prediction problem as a stochastic optimal prediction program, whose solution we approximate with supervised machine learning. The training data set consists of a large number of deterministic operational problems generated by controlled probabilistic sampling. The labels are computed based on solutions to these problems (solved independently and offline), employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application on load planning for rail transportation show that deep learning models produce accurate predictions in very short computing time (milliseconds or less). The predictive accuracy is close to the lower bounds calculated based on sample average approximation of the stochastic prediction programs.

2022-01-01

INFORMS Journal on Computing (published)

From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence

Nicholas Roy

Ingmar Posner

T. Barfoot

Philippe Beaudoin

Jeannette Bohg

Oliver Brock

Isabelle Depatie

Dieter Fox

D. Koditschek

Tom'as Lozano-p'erez

Vikash K. Mansinghka

Chris Pal

Blake Richards

Dorsa Sadigh

Stefan Schaal

G. Sukhatme

Denis Therien

Marc Emile Toussaint

Michiel van de Panne

2021-10-28

ArXiv (preprint)

Comparative Study of Learning Outcomes for Online Learning Platforms

Francois St-Hilaire

Nathan J. Burns

Robert Belfer

Muhammad Shayan

Ariella Smofsky

Dung D. Vu

Antoine Frau

Joseph Potochny

Farid Faraji

Vincent Pavero

Neroli Ko

Ansona Onyi Ching

Sabina Elkins

A. Stepanyan

Adela Matajova

Laurent Charlin

Iulian V. Serban

Ekaterina Kochmar

2021-06-12

Lecture Notes in Computer Science (published)

Meta-learning framework with applications to zero-shot time-series forecasting

Boris Oreshkin

Dmitri Carpov

Nicolas Chapados

Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new… (see more) TS coming from different datasets? This work provides positive evidence to this using a broad meta-learning framework which we show subsumes many existing meta-learning algorithms. Our theoretical analysis suggests that residual connections act as a meta-learning adaptation mechanism, generating a subset of task-specific parameters based on a given TS input, thus gradually expanding the expressive power of the architecture on-the-fly. The same mechanism is shown via linearization analysis to have the interpretation of a sequential update of the final linear layer. Our empirical results on a wide range of data emphasize the importance of the identified meta-learning mechanisms for successful zero-shot univariate forecasting, suggesting that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining, resulting in performance that is at least as good as that of state-of-practice univariate forecasting models.

2021-05-18

Proceedings of the AAAI Conference on Artificial Intelligence (published)