Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Marie-Josée Beauchamp, Administrative Assistant at marie-josee.beauchamp@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Mohammed Abukalam

Collaborating Alumni - Université de Montréal

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Junyeob BAEK

Independent visiting researcher - KAIST

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Paul Bertin

PhD - Université de Montréal

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

David Rolnick

Xiaoyin Chen

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

PhD - Université de Montréal

Research Intern - Université de Montréal

Co-supervisor :

Loubna Benabbou

Eric Elmoznino

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

Co-supervisor :

Leo Feng

PhD - Université de Montréal

leo.feng@mila.quebec

Ivan Grega

Research Intern - Université de Montréal

PhD

PhD - Université de Montréal

mohsin.hasan@mila.quebec

Edward Hu

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

moksh.jain@mila.quebec

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernandez

Yaroslav KIVVA

Collaborating researcher - Université de Montréal

Salem Lahlou

Collaborating Alumni - Université de Montréal

Tabitha Edith Lee

Postdoctorate - Université de Montréal

Principal supervisor :

Seanie Lee

Collaborating Alumni - Université de Montréal

Collaborating Alumni

Zhen Liu

Collaborating Alumni - Université de Montréal

Principal supervisor :

Liam Paull

Kanika Madan

PhD - Université de Montréal

Nikolay Malkin

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sören Mindermann

Collaborating researcher - Université de Montréal

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Postdoctorate - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

PhD - University of Waterloo

Principal supervisor :

Collaborating Alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Research Intern - Université de Montréal

Co-supervisor :

Loubna Benabbou

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

James Requeima

Independent visiting researcher - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Victor Schmidt

Collaborating Alumni - Université de Montréal

Postdoctorate - Université de Montréal

Master's Research - Université de Montréal

Marcin Sendera

Collaborating Alumni - Université de Montréal

Vedant Shah

Master's Research - Université de Montréal

Postdoctorate

Marco Stock

Independent visiting researcher - Technical University of Munich

marco.stock@tum.de

Mélisande Astrid Crystal Teng

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

alexander.tong@mila.quebec

Alex Tong

Postdoctorate - Université de Montréal

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Omar G. Younis

Collaborating researcher

Research Intern - Université de Montréal

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

PhD - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion

Chang Chen

Hany Hamed

Doojin Baek

Taegu Kang

Sungjin Ahn

2025-03-25

ArXiv (preprint)

Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion

Chang Chen

Hany Hamed

Doojin Baek

Taegu Kang

Sungjin Ahn

2025-03-25

ArXiv (preprint)

A scalable gene network model of regulatory dynamics in single cells

Paul Bertin

Joseph D Viviano

Alejandro Tejada-Lapuerta

Weixu Wang

Stefan Bauer

Fabian J. Theis

2025-03-25

ArXiv (preprint)

A scalable gene network model of regulatory dynamics in single cells

Paul Bertin

Joseph D Viviano

Alejandro Tejada-Lapuerta

Weixu Wang

Stefan Bauer

Fabian J. Theis

2025-03-25

ArXiv (preprint)

Offline Model-Based Optimization: Comprehensive Review

Minsu Kim

Jiayao Gu

Ye Yuan

Taeyoung Yun

Zixuan Liu

Can Chen

2025-03-21

ArXiv (preprint)

Offline Model-Based Optimization: Comprehensive Review

Minsu Kim

Jiayao Gu

Ye Yuan

Taeyoung Yun

Zixuan Liu

Can Chen

2025-03-21

ArXiv (preprint)

Learning Decision Trees as Amortized Structure Inference

Mohammed Mahfoud

Ghait Boukachab

Michał Koziarski

Alex Hernandez-Garcia

Stefan Bauer

Nikolay Malkin

2025-03-10

ArXiv (preprint)

Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles

Luca Scimeca

Alexander Rubinstein

Damien Teney

Seong Joon Oh

Armand Mihai Nicolicioiu

Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as shortcut lea… (see more)rning, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose

2025-03-06

ICLR.cc/2025/Workshop/SCSL (published)

Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control

Thomas Jiralerspong

Berton Earnshaw

Jason Hartford

Luca Scimeca

Diffusion Probabilistic Models (DPMs) are powerful generative models that have achieved unparalleled success in a number of generative tasks… (see more). In this work, we aim to build inductive biases into the training and sampling of diffusion models to better accommodate the target distribution of the data to model. For topologically structured data, we devise a frequency-based noising operator to purposefully manipulate, and set, these inductive biases. We first show that appropriate manipulations of the noising forward process can lead DPMs to focus on particular aspects of the distribution to learn. We show that different datasets necessitate different inductive biases, and that appropriate frequency-based noise control induces increased generative performance compared to standard diffusion. Finally, we demonstrate the possibility of ignoring information at particular frequencies while learning. We show this in an image corruption and recovery task, where we train a DPM to recover the original target distribution after severe noise corruption.

2025-03-06

ICLR.cc/2025/Workshop/DeLTa (poster)

Laurence Perreault-Levasseur

Solving Bayesian inverse problems with diffusion priors and off-policy RL

Luca Scimeca

Siddarth Venkatraman

Moksh J. Jain

Minsu Kim

Marcin Sendera

Mohsin Hasan

Luke Rowe

Sarthak Mittal

Pablo Lemos

Emmanuel Bengio

Alexandre Adam

Jarrid Rector-Brooks

Yashar Hezaveh

Glen Berseth

Nikolay Malkin

This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (see more)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.

2025-03-06

ICLR.cc/2025/Workshop/DeLTa (poster)

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Ahmed Masry

Juan A. Rodriguez

Tianyu Zhang

Suyuchen Wang

Chao Wang

Aarash Feizi

Akshay Kalkunte Suresh

Abhay Puri

Xiangru Jian

Pierre-Andre Noel

Sathwik Tejaswi Madhusudhan

Enamul Hoque

Issam Hadj Laradji

David Vazquez

Perouz Taslakian … (see 2 more)

Spandana Gella

Sai Rajeswar

Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges… (see more) on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), often produce out-of-distribution or noisy inputs, leading to misalignment between the modalities. In this work, we propose a novel vision-text alignment method, AlignVLM, that maps visual features to a weighted average of LLM text embeddings. Our approach leverages the linguistic priors encoded by the LLM to ensure that visual features are mapped to regions of the space that the LLM can effectively interpret. AlignVLM is particularly effective for document understanding tasks, where scanned document images must be accurately mapped to their textual content. Our extensive experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods. We provide further analysis demonstrating improved vision-text feature alignment and robustness to noise.

2025-03-05

ICLR.cc/2025/Workshop/Re-Align (poster)

Learning Decision Trees as Amortized Structure Inference

Mohammed Mahfoud

Ghait Boukachab

Michał Koziarski

Alex Hernandez-Garcia

Stefan Bauer

Nikolay Malkin

2025-03-05

ICLR.cc/2025/Workshop/FPI (poster)