Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Research Intern - McGill University

Mohammed Abukalam

Research Intern - Université de Montréal

Rim Assouel

PhD - Université de Montréal

Collaborating Alumni

Research Intern - Université du Québec à Rimouski

Stefan Bauer

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Paul Bertin

PhD - Université de Montréal

Ghait Boukachab

Research Intern - UQAR

Oussama Boussif

PhD - Université de Montréal

Independent visiting researcher - MIT

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

Chen Chen

Postdoctorate - Université de Montréal

Co-supervisor :

Blake Richards

Xiaoyin Chen

PhD - Université de Montréal

Pierre-Paul De Breuck

Collaborating Alumni - Université de Montréal

PhD - Université de Montréal

PhD - Université de Montréal

Collaborating researcher - Université Paris-Saclay

Principal supervisor :

Eric Elmoznino

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Katie Everett

PhD - Massachusetts Institute of Technology

Léna Nehale Ezzine

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

Co-supervisor :

Leo Feng

PhD - Université de Montréal

Research Intern - Barcelona University

Piotr Gainski

Research Intern - Université de Montréal

Ivan Grega

Collaborating researcher - Université de Montréal

Pietro Greiner

Research Intern

Mohsin Hasan

PhD - Université de Montréal

mohsin.hasan@mila.quebec

Alex Hernandez-Garcia

Postdoctorate - Université de Montréal

Co-supervisor :

Leon Hetzel

Independent visiting researcher - Technical University Munich (TUM)

Edward Hu

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

moksh.jain@mila.quebec

Research Intern - Université de Montréal

Master's Research - Université de Montréal

Co-supervisor :

Research Intern - Université de Montréal

Minsu Kim

Collaborating researcher - Université de Montréal

PhD - Université de Montréal

Michał Koziarski

Postdoctorate - Université de Montréal

Salem Lahlou

PhD - Université de Montréal

Hae-Beom Lee

Collaborating Alumni

Seanie Lee

Collaborating Alumni - Université de Montréal

Collaborating Alumni

Zhen Liu

PhD - Université de Montréal

Principal supervisor :

Liam Paull

Matt MacDermott

Research Intern - Imperial College London

PhD - Université de Montréal

Mohammed Mahfoud

Research Intern - Université de Montréal

Nikolay Malkin

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Stefano Massaroli

Postdoctorate - Université de Montréal

Collaborating Alumni

Collaborating researcher - Université de Montréal

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Postdoctorate - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Ling Pan

Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

PhD - University of Waterloo

Principal supervisor :

Nassim Rahaman

PhD - Max-Planck-Institute for Intelligent Systems

Jarrid Rector-Brooks

PhD - Université de Montréal

Co-supervisor :

Sarath Chandar

Danyal REHMAN

Postdoctorate - Université de Montréal

James Requeima

Independent visiting researcher - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Jessie Richter-Powell

Independent visiting researcher - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

agassoussisalwane2@gmail.com

Salwane Salwane

Research Intern - Université de Montréal

Theo Saulus

Collaborating researcher

Principal supervisor :

Victor Schmidt

PhD - Université de Montréal

Postdoctorate - Université de Montréal

Master's Research - Université de Montréal

Marcin Sendera

Research Intern - Université de Montréal

Dounia Shaaban Kabakibo

Research Intern - Université de Montréal

Vedant Shah

Master's Research - Université de Montréal

Collaborating Alumni

Marco Stock

Independent visiting researcher - Technical University of Munich

marco.stock@tum.de

Anja Surina

PhD - École Polytechnique Montréal Fédérale de Lausanne

Vincent Taboga

Postdoctorate - Polytechnique Montréal

Co-supervisor :

Pierre-Luc Bacon

Mélisande Astrid Crystal Teng

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher

Principal supervisor :

alexander.tong@mila.quebec

Alex Tong

Postdoctorate - Université de Montréal

Collaborating researcher - Valence

Principal supervisor :

Dominique Beaini

Donna Vakalis

Postdoctorate - Université de Montréal

Co-supervisor :

Viktor Viktor Todosijevic

Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)

Principal supervisor :

Sasha Volokhova

PhD - Université de Montréal

Zichao Yan

Collaborating Alumni - Université de Montréal

Kyle YUN

Collaborating researcher - KAIST

Elmimouni Zakaria

Research Intern - Université de Montréal

Nicole Zhang

PhD - McGill University

Principal supervisor :

Mathieu Blanchette

Dinghuai Zhang

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Ruixiang Zhang

PhD - Université de Montréal

Principal supervisor :

Liam Paull

Tianyu Zhang

PhD - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

PhD - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

SatBird: a Dataset for Bird Species Distribution Modeling using Remote Sensing and Citizen Science Data

Mélisande Teng

Amna Elmustafa

Benjamin Akera

Hager Radi

Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL

Chen Sun

Wannan Yang

Thomas Jiralerspong

Dane Malenfant

Benjamin Alsbury-Nealy

Blake Richards

In real life, success is often contingent upon multiple critical steps that are distant in time from each other and from the final reward. T… (see more)hese critical steps are challenging to identify with traditional reinforcement learning (RL) methods that rely on the Bellman equation for credit assignment. Here, we present a new RL algorithm that uses offline contrastive learning to hone in on these critical steps. This algorithm, which we call Contrastive Retrospection (ConSpec), can be added to any existing RL algorithm. ConSpec learns a set of prototypes for the critical steps in a task by a novel contrastive loss and delivers an intrinsic reward when the current state matches one of the prototypes. The prototypes in ConSpec provide two key benefits for credit assignment: (i) They enable rapid identification of all the critical steps. (ii) They do so in a readily interpretable manner, enabling out-of-distribution generalization when sensory features are altered. Distinct from other contemporary RL approaches to credit assignment, ConSpec takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon (and ignoring other states) than it is to prospectively predict reward at every taken step. ConSpec greatly improves learning in a diverse set of RL tasks. The code is available at the link: https://github.com/sunchipsster1/ConSpec

DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets

Lazar Atanackovic

Alexander Tong

Jason Hartford

Leo J Lee

Bo Wang

Improving *day-ahead* Solar Irradiance Time Series Forecasting by Leveraging Spatio-Temporal Context

Oussama Boussif

Ghait Boukachab

Dan Assouline

Stefano Massaroli

Tianle Yuan

Loubna Benabbou

Solar power harbors immense potential in mitigating climate change by substantially reducing CO…

Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network

Tristan Deleu

Mizu Nishikawa-Toomey

Jithendaraa Subramanian

Nikolay Malkin

Laurent Charlin

Generative Flow Networks (GFlowNets), a class of generative models over discrete and structured sample spaces, have been previously applied … (see more)to the problem of inferring the marginal posterior distribution over the directed acyclic graph (DAG) of a Bayesian Network, given a dataset of observations. Based on recent advances extending this framework to non-discrete sample spaces, we propose in this paper to approximate the joint posterior over not only the structure of a Bayesian Network, but also the parameters of its conditional probability distributions. We use a single GFlowNet whose sampling policy follows a two-phase process: the DAG is first generated sequentially one edge at a time, and then the corresponding parameters are picked once the full structure is known. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models of the Bayesian Network, making our approach applicable even to non-linear models parametrized by neural networks. We show that our method, called JSP-GFN, offers an accurate approximation of the joint posterior, while comparing favorably against existing methods on both simulated and real data.

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Stefano Massaroli

Michael Poli

Daniel Y Fu

Hermann Kumbong

Rom Nishijima Parnichkun

Aman Timalsina

David W. Romero

Quinn McIntyre

Beidi Chen

Atri Rudra

Ce Zhang

Christopher Re

Stefano Ermon

Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers… (see more). In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input sequence for each generated token -- similarly to attention-based models. In this paper, we seek to enable

Let the Flows Tell: Solving Graph Combinatorial Problems with GFlowNets

Dinghuai Zhang

Hanjun Dai

Nikolay Malkin

Aaron Courville

Ling Pan

Reusable Slotwise Mechanisms

Trang Nguyen

Amin Mansouri

Kanika Madan

Khuong N. Nguyen

Nguyen Duy Khuong

Kartik Ahuja

Dianbo Liu

Agents with the ability to comprehend and reason about the dynamics of objects would be expected to exhibit improved robustness and generali… (see more)zation in novel scenarios. However, achieving this capability necessitates not only an effective scene representation but also an understanding of the mechanisms governing interactions among object subsets. Recent studies have made significant progress in representing scenes using object slots. In this work, we introduce Reusable Slotwise Mechanisms, or RSM, a framework that models object dynamics by leveraging communication among slots along with a modular architecture capable of dynamically selecting reusable mechanisms for predicting the future states of each object slot. Crucially, RSM leverages the Central Contextual Information (CCI), enabling selected mechanisms to access the remaining slots through a bottleneck, effectively allowing for modeling of higher order and complex interactions that might require a sparse subset of objects. Experimental results demonstrate the superior performance of RSM compared to state-of-the-art methods across various future prediction and related downstream tasks, including Visual Question Answering and action planning. Furthermore, we showcase RSM's Out-of-Distribution generalization ability to handle scenes in intricate scenarios.

Neural Causal Structure Discovery from Interventions

Nan Rosemary Ke

Olexa Bilaniuk

Anirudh Goyal

Stefan Bauer

Hugo Larochelle

Bernhard Schölkopf

Michael Curtis Mozer

Chris Pal

Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (see more) However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository.

2023-09-10

TMLR (accepted)

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Patrick Mark Butlin

R. Long

Eric Elmoznino

Jonathan C. P. Birch

Axel Constant

George Deane

S. Fleming

C. Frith

Xuanxiu Ji

Ryota Kanai

C. Klein

Grace W. Lindsay

Matthias Michel

Liad Mudrik

Megan A. K. Peters

Eric Schwitzgebel

Jonathan Simon

Rufin Vanrullen

2023-08-17

ArXiv (preprint)

doi.org

arxiv.org

Scientific discovery in the age of artificial intelligence

Hanchen Wang

Tianfan Fu

Yuanqi Du

Wenhao Gao

Kexin Huang

Ziming Liu

Payal Chandak

Shengchao Liu

Peter Van Katwyk

Andreea Deac

Animashree Anandkumar

K. Bergen

Carla P. Gomes

Shirley Ho

Pushmeet Kohli

Joan Lasenby

Jure Leskovec

Tie-Yan Liu

A. Manrai

Debora Susan Marks … (see 10 more)

Bharath Ramsundar

Le Song

Jimeng Sun

Jian Tang

Petar Veličković

Max Welling

Linfeng Zhang

Connor Wilson. Coley

Marinka Žitnik

2023-08-01

Nature (published)

doi.org

What if We Enrich day-ahead Solar Irradiance Time Series Forecasting with Spatio-Temporal Context?

Oussama Boussif

Ghait Boukachab

Dan Assouline

Stefano Massaroli

Tianle Yuan

Loubna Benabbou

2023-07-28

ICML.cc/2023/Workshop/SynS_and_ML (published)

doi.org