Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Minsu Kim

Collaborating researcher - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernández-García

Tabitha Edith Lee

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Song LIU

Collaborating researcher - s.o.

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Divya Sharma

Postdoctorate

Co-supervisor :

Alex Hernández-García

Mélisande Astrid Crystal Teng

Collaborating Alumni - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Collaborating researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Collaborating researcher

Collaborating researcher - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro

Paul Bertin

Jarrid Rector-Brooks

Deepak Sharma

Thomas Gaudelet

Andrew Anighoro

Torsten Gross

Francisco Martínez-Peña

Eileen L. Tang

Suraj M S

Cristian Regep

Jeremy B. R. Hayter

Maksym Korablyov

Nicholas Valiante

Almer Van Der Sloot

Mike Tyers

Charles Roberts

Michael M. Bronstein

Luke L. Lairson

Jake P. Taylor-King

2022-02-06

arXiv (preprint)

Tackling Climate Change with Machine Learning

David Rolnick

Priya L. Donti

Lynn H. Kaack

Kelly Kochanski

Alexandre Lacoste

Kris Sankaran

Andrew Slavin Ross

Nikola Milojevic-Dupont

Natasha Jaques

Anna Waldman-Brown

Alexandra Luccioni

Tegan Maharaj

Evan D. Sherwin

S. Karthik Mukkavilli

Konrad P. Kording

Carla Gomes

Andrew Y. Ng

Demis Hassabis

John C. Platt

Felix Creutzig … (see 2 more)

Jennifer Chayes

Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we d… (see more)escribe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change.

2022-02-06

ACM Computing Surveys (published)

ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods

Victor Schmidt

Alexandra Luccioni

Mélisande Teng

Tianyu Zhang

Alexia Reynaud

Sunand Raghupathi

Gautier Cosne

Adrien Juraver

Vahe Vardanyan

Alex Hernández-García

Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both poli… (see more)cy-making and individual behaviour. However, taking action requires understanding the effects of climate change, even though they may seem abstract and distant. Projecting the potential consequences of extreme climate events such as flooding in familiar places can help make the abstract impacts of climate change more concrete and encourage action. As part of a larger initiative to build a website that projects extreme climate events onto user-chosen photos, we present our solution to simulate photo-realistic floods on authentic images. To address this complex task in the absence of suitable training data, we propose ClimateGAN, a model that leverages both simulated and real data for unsupervised domain adaptation and conditional image generation. In this paper, we describe the details of our framework, thoroughly evaluate components of our architecture and demonstrate that our model is capable of robustly generating photo-realistic flooding.

2022-01-27

ICLR.cc/2022/Conference (poster)

Continuous-Time Meta-Learning with Forward Mode Differentiation

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learni… (see more)ng (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differential equation (ODE). Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous, as opposed to a fixed and discrete number of gradient steps. As a consequence, we can optimize the amount of adaptation necessary to solve a new task using stochastic gradient descent, in addition to learning the initial conditions as is standard practice in gradient-based meta-learning. Importantly, in order to compute the exact meta-gradients required for the outer-loop updates, we devise an efficient algorithm based on forward mode differentiation, whose memory requirements do not scale with the length of the learning trajectory, thus allowing longer adaptation in constant memory. We provide analytical guarantees for the stability of COMLN, we show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.

2022-01-27

ICLR.cc/2022/Conference (spotlight)

Coordination Among Neural Modules Through a Shared Global Workspace

Nan Rosemary Ke

Nasim Rahaman

Jonathan Binas

Charles Blundell

Michael Mozer

Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For exam… (see more)ple, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorporate information from other positions; object-centric architectures make use of graph neural networks to model interactions among entities. However, pairwise interactions may not achieve global coordination or a coherent, integrated representation that can be used for downstream tasks. In cognitive science, a global workspace architecture has been proposed in which functionally specialized components share information through a common, bandwidth-limited communication channel. We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments. The proposed method includes a shared workspace through which communication among different specialist modules takes place but due to limits on the communication bandwidth, specialist modules must compete for access. We show that capacity limitations have a rational basis in that (1) they encourage specialization and compositionality and (2) they facilitate the synchronization of otherwise independent specialists.

2022-01-27

ICLR.cc/2022/Conference (oral)

Graph Neural Networks with Learnable Structural and Positional Representations

Vijay Prakash Dwivedi

Anh Tuan Luu

Thomas Laurent

Xavier Bresson

Graph neural networks (GNNs) have become the standard learning architectures for graphs. GNNs have been applied to numerous domains ranging … (see more)from quantum chemistry, recommender systems to knowledge graphs and natural language processing. A major issue with arbitrary graphs is the absence of canonical positional information of nodes, which decreases the representation power of GNNs to distinguish e.g. isomorphic nodes and other graph symmetries. An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers. Possible graph PE are Laplacian eigenvectors. In this work, we propose to decouple structural and positional representations to make easy for the network to learn these two essential properties. We introduce a novel generic architecture which we call LSPE (Learnable Structural and Positional Encodings). We investigate several sparse and fully-connected (Transformer-like) GNNs, and observe a performance increase for molecular datasets, from 1.79% up to 64.14% when considering learnable PE for both GNN classes.

2022-01-27

ICLR.cc/2022/Conference (poster)

Properties from Mechanisms: An Equivariance Perspective on Identifiable Representation Learning

Kartik Ahuja

Jason Hartford

A key goal of unsupervised representation learning is "inverting" a data generating process to recover its latent properties. Existing work … (see more)that provably achieves this goal relies on strong assumptions on relationships between the latent variables (e.g., independence conditional on auxiliary information). In this paper, we take a very different perspective on the problem and ask, "Can we instead identify latent properties by leveraging knowledge of the mechanisms that govern their evolution?" We provide a complete characterization of the sources of non-identifiability as we vary knowledge about a set of possible mechanisms. In particular, we prove that if we know the exact mechanisms under which the latent properties evolve, then identification can be achieved up to any equivariances that are shared by the underlying mechanisms. We generalize this characterization to settings where we only know some hypothesis class over possible mechanisms, as well as settings where the mechanisms are stochastic. We demonstrate the power of this mechanism-based perspective by showing that we can leverage our results to generalize existing identifiable representation learning results. These results suggest that by exploiting inductive biases on mechanisms, it is possible to design a range of new identifiable representation learning approaches.

2022-01-27

ICLR.cc/2022/Conference (spotlight)

Unifying Likelihood-Free Inference with Black-Box Optimization and Beyond

Black-box optimization formulations for biological sequence design have drawn recent attention due to their promising potential impact on th… (see more)e pharmaceutical industry. In this work, we propose to unify two seemingly distinct worlds: likelihood-free inference and black-box optimization, under one probabilistic framework. In tandem, we provide a recipe for constructing various sequence design methods based on this framework. We show how previous optimization approaches can be "reinvented" in our framework, and further propose new probabilistic black-box optimization algorithms. Extensive experiments on sequence design application illustrate the benefits of the proposed methodology.

2022-01-27

International Conference on Learning Representations (spotlight)

Boosting Exploration in Multi-Task Reinforcement Learning using Adversarial Networks

Ramnath Kumar

Tristan Deleu

2022-01-26

ArXiv (preprint)

Biasly: a machine learning based platform for automatic racial discrimination detection in online texts

David Bamman

Chris Dyer

Noah A. Smith. 2014

Steven Bird

Ewan Klein

Edward Loper

Nat-527

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Kristina Toutanova. 2019

Bert

Samuel Gehman

Suchin Gururangan

Maarten Sap

Dan Hendrycks

Kevin Gimpel. 2020

Gaussian

Alex Lamb

Di He … (see 22 more)

Anirudh Goyal

Guolin Ke

Feng-Ju Liao

Mirco Ravanaelli

Zhenzhong Lan

Mingda Chen

Sebastian Goodman

Yann Lecun

Bernhard E. Boser

J. Denker

Don-608 nie Henderson

Robin Howard

Wayne Hubbard

Yinhan Liu

Myle Ott

Naman Goyal

Jingfei Du

Mandar Joshi

Danqi Chen

Omer Levy

Mike Lewis

Warning : this paper contains content that may 001 be offensive or upsetting. 002 Detecting hateful, toxic, and otherwise racist 003 or sexi… (see more)st language in user-generated online con-004 tents has become an increasingly important task 005 in recent years. Indeed, the anonymity, the 006 transience, the size of messages, and the dif-007 ficulty of management, facilitate the diffusion 008 of racist or hateful messages across the Inter-009 net. The critical influence of this cyber-racism 010 is no longer limited to social media, but also 011 has a significant effect on our society : corpo-012 rate business operation, users’ health, crimes, 013 etc. Traditional racist speech reporting chan-014 nels have proven inadequate due to the enor-015 mous explosion of information, so there is an 016 urgent need for a method to automatically and 017 promptly detect texts with racial discrimination. 018 We propose in this work, a machine learning-019 based approach to enable automatic detection 020 of racist text content over the internet. State-of-021 the-art machine learning models that are able 022 to grasp language structures are adapted in this 023 study. Our main contribution include 1) a large 024 scale racial discrimination data set collected 025 from three distinct sources and annotated ac-026 cording to a guideline developed by specialists, 027 2) a set of machine learning models with vari-028 ous architectures for racial discrimination de-029 tection, and 3) a web-browser-based software 030 that assist users to debias their texts when us-031 ing the internet. All these resources are made 032 publicly available.

2021-12-31

(published)

www.semanticscholar.org

Catalyzing next-generation Artificial Intelligence through NeuroAI

Anthony Zador

Blake Aaron Richards

Bence Ölveczky

Sean Escola

Kwabena Boahen

Matthew Botvinick

Dmitri Chklovskii

Anne Churchland

Claudia Clopath

James DiCarlo

Surya Ganguli

Jeff Hawkins

Konrad Paul Kording

Alexei Koulakov

Yann Lecun

Timothy P Lillicrap

Adam Marblestone

Bruno Olshausen

Alexandre Pouget … (see 7 more)

Cristina Savin

Terrence Sejnowski

Eero Simoncelli

Sara Solla

David Sussillo

Andreas S. Tolias

Doris Tsao

2021-12-31

arXiv.org (preprint)

Contrastive introspection (ConSpec) to rapidly identify invariant prototypes for success in RL

Chen Sun

Mila

Wannan Yang

Benjamin Alsbury-Nealy

Thomas Jiralerspong