Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Mohammed Abukalam

Collaborating Alumni - Université de Montréal

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Stefan Bauer

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Paul Bertin

PhD - Université de Montréal

Joyce Chai

Independent visiting researcher

Principal supervisor :

Siva Reddy

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

David Rolnick

Xiaoyin Chen

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

Collaborating Alumni - Université de Montréal

PhD - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

Loubna Benabbou

Desmond Elliott

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

Leo Feng

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Edward Hu

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernandez-Garcia

Salem Lahlou

Collaborating Alumni - Université de Montréal

Tabitha Edith Lee

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Zhen Liu

Collaborating Alumni - Université de Montréal

Principal supervisor :

Liam Paull

Kanika Madan

PhD - Université de Montréal

Nikolay Malkin

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Postdoctorate - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Nassim Rahaman

Collaborating Alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Collaborating researcher - Université de Montréal

Co-supervisor :

Loubna Benabbou

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

James Requeima

Independent visiting researcher - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Abhik Roychoudhury Roychoudhury

Independent visiting researcher

Principal supervisor :

Siva Reddy

Luca Scimeca

Postdoctorate - Université de Montréal

Collaborating Alumni - Université de Montréal

Marcin Sendera

Collaborating Alumni - Université de Montréal

Divya Sharma

Postdoctorate

Co-supervisor :

Alex Hernandez-Garcia

Mélisande Astrid Crystal Teng

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Independent visiting researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher

Collaborating researcher - Université de Montréal

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

CACHE (Critical Assessment of Computational Hit-finding Experiments): A public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding

Suzanne Ackloo

R. Al-Awar

Rommie Elizabeth Amaro

C. Arrowsmith

Hatylas F. Z. Azevedo

R. Batey

U. Betz

Cristian G. Bologa

J. Chodera

Wendy Cornell

Ian Dunham

G. Ecker

Kristina Edfeldt

A. Edwards

M. Gilson

Cláudia Regina Gordijo

G. Hessler

Alexander Hillisch

Anders C Hogner … (see 19 more)

John Joseph Irwin

J. Jansen

Daniel Kuhn

Andrew R. Leach

Alpha A. Lee

Uta F. Lessel

J. Moult

Ingo Muegge

Tudor I. Oprea

Ben Perry

Patrick F. Riley

K. Saikatendu

Vijayaratnam Santhakumar

Matthieu Schapira

Cora Scholten

M. Todd

Masoud Vedadi

Andrea Volkamer

T. Willson

2022-02-15

Nature Reviews Chemistry (published)

doi.org

RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro

Paul Bertin

Jarrid Rector-Brooks

Deepak Sharma

Thomas Gaudelet

Andrew Anighoro

Torsten Gross

Francisco Martínez-Peña

Eileen L. Tang

S. SurajM

Cristian Regep

Jeremy B.R. Hayter

Maksym Korablyov

N. Valiante

Almer M. van der Sloot

Mike Tyers

Charles E.S. Roberts

Michael M. Bronstein

Luke Lee Lairson

Jake P. Taylor-King

2022-02-07

ArXiv (preprint)

arxiv.org

Tackling Climate Change with Machine Learning

David Rolnick

Priya L. Donti

Lynn H. Kaack

Kelly Kochanski

Alexandre Lacoste

Kris Sankaran

Andrew Slavin Ross

Nikola Milojevic-Dupont

Natasha Jaques

Anna Waldman-Brown

Alexandra Luccioni

Tegan Maharaj

Evan David Sherwin

S. Karthik Mukkavilli

Konrad Paul Kording

Carla P. Gomes

Andrew Y. Ng

Demis Hassabis

John C. Platt

Felix Creutzig … (see 2 more)

Jennifer T Chayes

Climate change is one of the greatest challenges facing humanity, and we, as machine learning (ML) experts, may wonder how we can help. Here… (see more) we describe how ML can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by ML, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the ML community to join the global effort against climate change.

2022-02-07

ACM Computing Surveys (published)

doi.org

arxiv.org

ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods

Victor Schmidt

Alexandra Luccioni

Mélisande Teng

Tianyu Zhang

Alexia Reynaud

Sunand Raghupathi

Gautier Cosne

Adrien Juraver

Vahe Vardanyan

Alex Hernandez-Garcia

Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both poli… (see more)cy-making and individual behaviour. However, taking action requires understanding the effects of climate change, even though they may seem abstract and distant. Projecting the potential consequences of extreme climate events such as flooding in familiar places can help make the abstract impacts of climate change more concrete and encourage action. As part of a larger initiative to build a website that projects extreme climate events onto user-chosen photos, we present our solution to simulate photo-realistic floods on authentic images. To address this complex task in the absence of suitable training data, we propose ClimateGAN, a model that leverages both simulated and real data for unsupervised domain adaptation and conditional image generation. In this paper, we describe the details of our framework, thoroughly evaluate components of our architecture and demonstrate that our model is capable of robustly generating photo-realistic flooding.

2022-01-28

ICLR.cc/2022/Conference (poster)

Continuous-Time Meta-Learning with Forward Mode Differentiation

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learni… (see more)ng (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differential equation (ODE). Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous, as opposed to a fixed and discrete number of gradient steps. As a consequence, we can optimize the amount of adaptation necessary to solve a new task using stochastic gradient descent, in addition to learning the initial conditions as is standard practice in gradient-based meta-learning. Importantly, in order to compute the exact meta-gradients required for the outer-loop updates, we devise an efficient algorithm based on forward mode differentiation, whose memory requirements do not scale with the length of the learning trajectory, thus allowing longer adaptation in constant memory. We provide analytical guarantees for the stability of COMLN, we show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.

2022-01-28

ICLR.cc/2022/Conference (spotlight)

doi.org

Coordination Among Neural Modules Through a Shared Global Workspace

Anirudh Goyal

Aniket Rajiv Didolkar

Alex Lamb

Kartikeya Badola

Nan Rosemary Ke

Nasim Rahaman

Jonathan Binas

Charles Blundell

Michael Curtis Mozer

Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For exam… (see more)ple, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorporate information from other positions and object-centric architectures make use of graph neural networks to model interactions among entities. We consider how to improve on pairwise interactions in terms of global coordination and a coherent, integrated representation that can be used for downstream tasks. In cognitive science, a global workspace architecture has been proposed in which functionally specialized components share information through a common, bandwidth-limited communication channel. We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments. The proposed method includes a shared workspace through which communication among different specialist modules takes place but due to limits on the communication bandwidth, specialist modules must compete for access. We show that capacity limitations have a rational basis in that (1) they encourage specialization and compositionality and (2) they facilitate the synchronization of otherwise independent specialists.

2022-01-28

ICLR.cc/2022/Conference (oral)

Graph Neural Networks with Learnable Structural and Positional Representations

Vijay Prakash Dwivedi

Anh Tuan Luu

Thomas Laurent

Xavier Bresson

Graph neural networks (GNNs) have become the standard learning architectures for graphs. GNNs have been applied to numerous domains ranging … (see more)from quantum chemistry, recommender systems to knowledge graphs and natural language processing. A major issue with arbitrary graphs is the absence of canonical positional information of nodes, which decreases the representation power of GNNs to distinguish e.g. isomorphic nodes and other graph symmetries. An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers. Possible graph PE are Laplacian eigenvectors. In this work, we propose to decouple structural and positional representations to make easy for the network to learn these two essential properties. We introduce a novel generic architecture which we call LSPE (Learnable Structural and Positional Encodings). We investigate several sparse and fully-connected (Transformer-like) GNNs, and observe a performance increase for molecular datasets, from 1.79% up to 64.14% when considering learnable PE for both GNN classes.

2022-01-28

ICLR.cc/2022/Conference (poster)

Properties from mechanisms: an equivariance perspective on identifiable representation learning

Kartik Ahuja

Jason Hartford

A key goal of unsupervised representation learning is ``inverting'' a data generating process to recover its latent properties. Existing wo… (see more)rk that provably achieves this goal relies on strong assumptions on relationships between the latent variables (e.g., independence conditional on auxiliary information). In this paper, we take a very different perspective on the problem and ask, ``Can we instead identify latent properties by leveraging knowledge of the mechanisms that govern their evolution?'' We provide a complete characterization of the sources of non-identifiability as we vary knowledge about a set of possible mechanisms. In particular, we prove that if we know the exact mechanisms under which the latent properties evolve, then identification can be achieved up to any equivariances that are shared by the underlying mechanisms. We generalize this characterization to settings where we only know some hypothesis class over possible mechanisms, as well as settings where the mechanisms are stochastic. We demonstrate the power of this mechanism-based perspective by showing that we can leverage our results to generalize existing identifiable representation learning results. These results suggest that by exploiting inductive biases on mechanisms, it is possible to design a range of new identifiable representation learning approaches.

2022-01-28

ICLR.cc/2022/Conference (spotlight)

Boosting Exploration in Multi-Task Reinforcement Learning using Adversarial Networks

Ramnath Kumar

Tristan Deleu

2022-01-27

ArXiv (preprint)

arxiv.org

Biasly: a machine learning based platform for automatic racial discrimination detection in online texts

David Bamman

Chris Dyer

Noah A. Smith. 2014

Steven Bird

Ewan Klein

Edward Loper

Nat-527

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Kristina Toutanova. 2019

Bert

Samuel Gehman

Suchin Gururangan

Maarten Sap

Dan Hendrycks

Kevin Gimpel. 2020

Gaussian

Alex Lamb

Di He … (see 22 more)

Anirudh Goyal

Guolin Ke

Feng Liao

Mirco Ravanelli

Zhenzhong Lan

Mingda Chen

Sebastian Goodman

Yann Lecun

Bernhard E. Boser

J. Denker

Don-608 nie Henderson

Robin Howard

Wayne Hubbard

Yinhan Liu

Myle Ott

Naman Goyal

Jingfei Du

Mandar Joshi

Danqi Chen

Omer Levy

Mike Lewis

Warning : this paper contains content that may 001 be offensive or upsetting. 002 Detecting hateful, toxic, and otherwise racist 003 or sexi… (see more)st language in user-generated online con-004 tents has become an increasingly important task 005 in recent years. Indeed, the anonymity, the 006 transience, the size of messages, and the dif-007 ficulty of management, facilitate the diffusion 008 of racist or hateful messages across the Inter-009 net. The critical influence of this cyber-racism 010 is no longer limited to social media, but also 011 has a significant effect on our society : corpo-012 rate business operation, users’ health, crimes, 013 etc. Traditional racist speech reporting chan-014 nels have proven inadequate due to the enor-015 mous explosion of information, so there is an 016 urgent need for a method to automatically and 017 promptly detect texts with racial discrimination. 018 We propose in this work, a machine learning-019 based approach to enable automatic detection 020 of racist text content over the internet. State-of-021 the-art machine learning models that are able 022 to grasp language structures are adapted in this 023 study. Our main contribution include 1) a large 024 scale racial discrimination data set collected 025 from three distinct sources and annotated ac-026 cording to a guideline developed by specialists, 027 2) a set of machine learning models with vari-028 ous architectures for racial discrimination de-029 tection, and 3) a web-browser-based software 030 that assist users to debias their texts when us-031 ing the internet. All these resources are made 032 publicly available.

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Max Morrison

Prem Seetharaman

2022-01-01

International Conference on Learning Representations (published)

Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal

Sharath Chandra Raparthy

Irina Rish

Guillaume Lajoie

Multi-head, key-value attention is the backbone of transformer-like model architectures which have proven to be widely successful in recent … (see more)years. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interaction, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval and is easy to implement in a variety of established network architectures.

2022-01-01

International Conference on Learning Representations (published)