Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernández-García

Minsu Kim

Collaborating researcher - Université de Montréal

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Song LIU

Collaborating researcher - s.o.

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Divya Sharma

Postdoctorate

Co-supervisor :

Alex Hernández-García

Mélisande Astrid Crystal Teng

Collaborating Alumni - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Collaborating researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Collaborating researcher

Collaborating researcher - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

Open Problems in Technical AI Governance

Anka Reuel

Benjamin Bucknall

Stephen Casper

Timothy Fist

Lisa Soder

Onni Aarne

Lewis Hammond

Lujain Ibrahim

Alan Chan

Peter Wills

Markus Anderljung

Ben Garfinkel

Lennart Heim

Andrew Trask

Gabriel Mukobi

Rylan Schaeffer

Mauricio Baker

Sara Hooker

Irene Solaiman

Sasha Luccioni … (see 14 more)

Alexandra Luccioni

Nitarshan Rajkumar

Nicolas Moës

Jeffrey Ladish

David Bau

Paul Bricman

Neel Guha

Jessica Newman

Tobin South

Alex Pentland

Sanmi Koyejo

Mykel Kochenderfer

Robert Trager

AI progress is creating a growing range of risks and opportunities, but it is often unclear how they should be navigated. In many cases, the… (see more) barriers and uncertainties faced are at least partly technical. Technical AI governance, referring to technical analysis and tools for supporting the effective governance of AI, seeks to address such challenges. It can help to (a) identify areas where intervention is needed, (b) identify and assess the efficacy of potential governance actions, and (c) enhance governance options by designing mechanisms for enforcement, incentivization, or compliance. In this paper, we explain what technical AI governance is, why it is important, and present a taxonomy and incomplete catalog of its open problems. This paper is intended as a resource for technical researchers or research funders looking to contribute to AI governance.

2025-04-13

Transactions on Machine Learning Research (accepted)

Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery

Mélisande Teng

2025-03-25

ArXiv (preprint)

Extendable Planning via Multiscale Diffusion

Chang Chen

Hany Hamed

Doojin Baek

Taegu Kang

Sungjin Ahn

Long-horizon planning is crucial in complex environments, but diffusion-based planners like Diffuser are limited by the trajectory lengths o… (see more)bserved during training. This creates a dilemma: long trajectories are needed for effective planning, yet they degrade model performance. In this paper, we introduce this extendable long-horizon planning challenge and propose a two-phase solution. First, Progressive Trajectory Extension incrementally constructs longer trajectories through multi-round compositional stitching. Second, the Hierarchical Multiscale Diffuser enables efficient training and inference over long horizons by reasoning across temporal scales. To avoid the need for multiple separate models, we propose Adaptive Plan Pondering and the Recursive HM-Diffuser, which unify hierarchical planning within a single model. Experiments show our approach yields strong performance gains, advancing scalable and efficient decision-making over long-horizons.

2025-03-24

ArXiv (preprint)

A scalable gene network model of regulatory dynamics in single cells

Paul Bertin

Joseph D Viviano

Alejandro Tejada-Lapuerta

Weixu Wang

Stefan Bauer

Fabian J. Theis

2025-03-24

ArXiv (preprint)

Offline Model-Based Optimization: Comprehensive Review

Jiayao Gu

Zixuan Liu

Can Chen

2025-03-20

ArXiv (preprint)

What makes a theory of consciousness unscientific?

IIT-Concerned

Derek H. Arnold

Mark G. Baxter

Tristan A. Bekinschtein

James W. Bisley

Jacob Browning

Dean V. Buonomano

David Carmel

Marisa Carrasco

Peter Carruthers

Olivia Carter

Dorita H. F. Chang

Ian Charest

Mouslim Cherkaoui

Axel Cleeremans

Michael A. Cohen

Philip R. Corlett

Kalina Christoff

Sam Cumming … (see 80 more)

Cody A. Cushing

Beatrice de Gelder

Felipe De Brigard

Daniel C. Dennett

Nadine Dijkstra

Adrien Doerig

Paul E. Dux

Stephen M. Fleming

Keith Frankish

Chris Frith

Sarah Garfinkel

Melvyn A. Goodale

Jacqueline Gottlieb

Jake R. Hanson

Ran R. Hassin

Michael H. Herzog

Cecilia Heyes

Po‐Jang Hsieh

Shao‐Min Hung

Robert W. Kentridge

Tomas Knapen

Nikos Konstantinou

Konrad P. Kording

Timo L. Kvamme

Sze Chai Kwok

Renzo C. Lanfranco

Hakwan Lau

Joseph E. LeDoux

Alan Lee

Camilo Libedinsky

Matthew D. Lieberman

Ying-Tung Lin

Kayuet Liu

Maro G. Machizawa

Julio Martínez-Trujillo

Janet Metcalfe

Matthias Michel

Kenneth D. Miller

Partha P. Mitra

Dean Mobbs

Robert M. Mok

Jorge Morales

Myrto Mylopoulos

Brian Odegaard

Charles C.-F. Or

Adrian M. Owen

David Pereplyotchik

Franco Pestilli

Megan A. K. Peters

Ian Phillips

Rosanne L. Rademaker

Dobromir Rahnev

Geraint Rees

Dario L. Ringach

Adina L. Roskies

Daniela Schiller

Aaron Schurger

D. Samuel Schwarzkopf

R. B. Y. Scott

Aaron R. Seitz

Joshua Shepherd

Juha Silvanto

Heleen A. Slagter

Barry Smith

Guillermo Solovey

David Soto

Hugo J. Spiers

Timo Stein

Vincent Taschereau‐Dumouchel

Frank Tong

Peter U. Tse

Jonas Vibell

Sebastian Watzl

Taylor W. Webb

Josh Weisberg

Thalia Wheatley

Michał Wierzchoń

Martijn E. Wokke

Karen Yan

Michał Klincewicz

2025-03-09

Nature Neuroscience (unknown)

Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control

Thomas Jiralerspong

Berton Earnshaw

Jason Hartford

Luca Scimeca

Diffusion Probabilistic Models (DPMs) are powerful generative models that have achieved unparalleled success in a number of generative tasks… (see more). In this work, we aim to build inductive biases into the training and sampling of diffusion models to better accommodate the target distribution of the data to model. For topologically structured data, we devise a frequency-based noising operator to purposefully manipulate, and set, these inductive biases. We first show that appropriate manipulations of the noising forward process can lead DPMs to focus on particular aspects of the distribution to learn. We show that different datasets necessitate different inductive biases, and that appropriate frequency-based noise control induces increased generative performance compared to standard diffusion. Finally, we demonstrate the possibility of ignoring information at particular frequencies while learning. We show this in an image corruption and recovery task, where we train a DPM to recover the original target distribution after severe noise corruption.

2025-03-05

ICLR.cc/2025/Workshop/DeLTa (poster)

Laurence Perreault-Levasseur

Solving Bayesian Inverse Problems with Diffusion Priors and Off-Policy RL

Glen Berseth

Nikolay Malkin

This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (see more)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.

2025-03-05

ICLR.cc/2025/Workshop/DeLTa (poster)

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

Juan A. Rodriguez

Chao Wang

Akshay Kalkunte Suresh

Abhay Puri

Xiangru Jian

Pierre-Andre Noel

Sathwik Tejaswi Madhusudhan

Enamul Hoque

Christopher Pal

Issam H. Laradji

David Vázquez

Perouz Taslakian … (see 2 more)

Spandana Gella

Sai Rajeswar

Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges… (see more) on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), lack inductive bias to constrain visual features within the linguistic structure of the LLM's embedding space, making them data-hungry and prone to cross-modal misalignment. In this work, we propose a novel vision-text alignment method, AlignVLM, that maps visual features to a weighted average of LLM text embeddings. Our approach leverages the linguistic priors encoded by the LLM to ensure that visual features are mapped to regions of the space that the LLM can effectively interpret. AlignVLM is particularly effective for document understanding tasks, where visual and textual modalities are highly correlated. Our extensive experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods, with larger gains on document understanding tasks and under low-resource setups. We provide further analysis demonstrating its efficiency and robustness to noise.

2025-03-04

ICLR.cc/2025/Workshop/Re-Align (poster)

Learning Decision Trees as Amortized Structure Inference

Mohammed Mahfoud

Ghait Boukachab

Michał Koziarski

Alex Hernández-García

Stefan Bauer

Nikolay Malkin

2025-03-04

ICLR.cc/2025/Workshop/FPI (poster)

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

Diego Velazquez

Pau Rodríguez

Sergio Alonso

Josep M. Gonfaus

Jordi Gonzalez

Gerardo Richarte

Javier Marin

Alexandre Lacoste

This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhanc… (see more)e deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.

2025-03-03

2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) (published)

A physics-based data-driven model for CO$_2$ gas diffusion electrodes to drive automated laboratories

Ivan Grega

Félix Therrien

Abhishek Soni

Karry Ocean

Kevan Dettelbach

Ribwar Ahmadi

Mehrdad Mokhtari

Curtis P. Berlinguette

The electrochemical reduction of atmospheric CO…

2025-03-02

ICLR.cc/2025/Workshop/AI4MAT (poster)