Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernández-García

Minsu Kim

Collaborating researcher - Université de Montréal

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Song LIU

Collaborating researcher - s.o.

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Divya Sharma

Postdoctorate

Co-supervisor :

Alex Hernández-García

Mélisande Astrid Crystal Teng

Collaborating Alumni - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Collaborating researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Collaborating researcher

Collaborating researcher - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Stefano Massaroli

Michael Poli

Daniel Y Fu

Hermann Kumbong

Rom Nishijima Parnichkun

Aman Timalsina

David W. Romero

Quinn McIntyre

Beidi Chen

Atri Rudra

Ce Zhang

Christopher Re

Stefano Ermon

Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers… (see more). In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input sequence for each generated token -- similarly to attention-based models. In this paper, we seek to enable

2023-09-20

NeurIPS.cc/2023/Conference (poster)

Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets

Dinghuai Zhang

Hanjun Dai

Nikolay Malkin

Aaron Courville

Ling Pan

2023-09-20

NeurIPS.cc/2023/Conference (spotlight)

Reusable Slotwise Mechanisms

Trang Nguyen

Amin Mansouri

Kanika Madan

Khuong Nguyen

Nguyen Duy Khuong

Kartik Ahuja

Dianbo Liu

Agents with the ability to comprehend and reason about the dynamics of objects would be expected to exhibit improved robustness and generali… (see more)zation in novel scenarios. However, achieving this capability necessitates not only an effective scene representation but also an understanding of the mechanisms governing interactions among object subsets. Recent studies have made significant progress in representing scenes using object slots. In this work, we introduce Reusable Slotwise Mechanisms, or RSM, a framework that models object dynamics by leveraging communication among slots along with a modular architecture capable of dynamically selecting reusable mechanisms for predicting the future states of each object slot. Crucially, RSM leverages the Central Contextual Information (CCI), enabling selected mechanisms to access the remaining slots through a bottleneck, effectively allowing for modeling of higher order and complex interactions that might require a sparse subset of objects. Experimental results demonstrate the superior performance of RSM compared to state-of-the-art methods across various future prediction and related downstream tasks, including Visual Question Answering and action planning. Furthermore, we showcase RSM's Out-of-Distribution generalization ability to handle scenes in intricate scenarios.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

Neural Causal Structure Discovery from Interventions

Nan Rosemary Ke

Bernhard Schölkopf

Michael Curtis Mozer

Christopher Pal

Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (see more) However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository.

2023-09-09

TMLR (accepted)

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Patrick Mark Butlin

Run Long

Eric Elmoznino

Jonathan C. P. Birch

Axel Constant

George Deane

S. Fleming

C. Frith

Xuanxiu Ji

Ryota Kanai

C. Klein

Grace W. Lindsay

Matthias Michel

Liad Mudrik

Megan A. K. Peters

Eric Schwitzgebel

Jonathan Simon

Rufin Vanrullen

2023-08-16

ArXiv (preprint)

Scientific discovery in the age of artificial intelligence

Hanchen Wang

Tianfan Fu

Yuanqi Du

Wenhao Gao

Kexin Huang

Ziming Liu

Payal Chandak

Shengchao Liu

Peter Van Katwyk

Andreea Deac

Animashree Anandkumar

K. Bergen

Carla P. Gomes

Shirley Ho

Pushmeet Kohli

Joan Lasenby

Jure Leskovec

Tie-Yan Liu

A. Manrai

Debora Susan Marks … (see 10 more)

Bharath Ramsundar

Le Song

Jimeng Sun

Jian Tang

Petar Veličković

MAX WELLING

Linfeng Zhang

Connor Wilson. Coley

Marinka Žitnik

2023-07-31

Nature (published)

AI For Global Climate Cooperation 2023 Competition Proceedings

Prateek Arun Gupta

Li Li

Soham Rajesh Phade

Sunil Srinivasa

Andrew Robert Williams

Tianyu Zhang

Yang Zhang

Stephan Zheng

The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achie… (see more)ve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agreements, that promote and incentivize collaboration. In addition, these frameworks should also have policy goals fulfillment, and sustained commitment, taking into account climate-economic dynamics and strategic behaviors. These challenges require an interdisciplinary approach across machine learning, economics, climate science, law, policy, ethics, and other fields. Towards this objective, we organized AI for Global Climate Cooperation, a Mila competition in which teams submitted proposals and analyses of international frameworks, based on (modifications of) RICE-N, an AI-driven integrated assessment model (IAM). In particular, RICE-N supports modeling regional decision-making using AI agents. Furthermore, the IAM then models the climate-economic impact of those decisions into the future. Whereas the first track focused only on performance metrics, the proposals submitted to the second track were evaluated both quantitatively and qualitatively. The quantitative evaluation focused on a combination of (i) the degree of mitigation of global temperature rise and (ii) the increase in economic productivity. On the other hand, an interdisciplinary panel of human experts in law, policy, sociology, economics and environmental science, evaluated the solutions qualitatively. In particular, the panel considered the effectiveness, simplicity, feasibility, ethics, and notions of climate justice of the protocols. In the third track, the participants were asked to critique and improve RICE-N.

2023-07-09

ArXiv (preprint)

International Institutions for Advanced AI

Lewis Ho

Joslyn N. Barnhart

Robert Trager

Miles Brundage

Allison Sovey Carnegie

Rumman Chowdhury

Allan Dafoe

Gillian K. Hadfield

Margaret Levi

D. Snidal

2023-07-09

ArXiv (preprint)

Generative Flow Networks: a Markov Chain Perspective

Tristan Deleu

2023-07-03

ArXiv (preprint)

Better Training of GFlowNets with Local Credit and Incomplete Trajectories

Ling Pan

Nikolay Malkin

Dinghuai Zhang

Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an en… (see more)ergy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). They are trained to generate an object

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

proceedings.mlr.press

Discrete Key-Value Bottleneck

Frederik Träuble

Anirudh Goyal

Nasim Rahaman

Michael Mozer

Kenji Kawaguchi

Bernhard Schölkopf

Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. Challenges emerge with… (see more) non-stationary training data streams such as continual learning. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes. Our paradigm will be to encode; process the representation via a discrete bottleneck; and decode. Here, the input is fed to the pre-trained encoder, the output of the encoder is used to select the nearest keys, and the corresponding values are fed to the decoder to solve the current task. The model can only fetch and re-use a sparse number of these key-value pairs during inference, enabling localized and context-dependent model updates. We theoretically investigate the ability of the discrete key-value bottleneck to minimize the effect of learning under distribution shifts and show that it reduces the complexity of the hypothesis class. We empirically verify the proposed method under challenging class-incremental learning scenarios and show that the proposed model - without any task boundaries - reduces catastrophic forgetting across a wide variety of pre-trained models, outperforming relevant baselines on this task.

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

proceedings.mlr.press

Equivariance with Learned Canonicalization Functions

Sékou-Oumar Kaba

Arnab Kumar Mondal

Yan Zhang

Siamak Ravanbakhsh

Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations… (see more). In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representations of the data. These canonicalization functions can readily be plugged into non-equivariant backbone architectures. We offer explicit ways to implement them for some groups of interest. We show that this approach enjoys universality while providing interpretable insights. Our main hypothesis, supported by our empirical results, is that learning a small neural network to perform canonicalization is better than using predefined heuristics. Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks, including image classification,

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)