Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Mohammed Abukalam

Collaborating Alumni - Université de Montréal

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Stefan Bauer

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Paul Bertin

PhD - Université de Montréal

Joyce Chai

Independent visiting researcher

Principal supervisor :

Siva Reddy

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

David Rolnick

Xiaoyin Chen

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

Collaborating Alumni - Université de Montréal

PhD - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

Loubna Benabbou

Desmond Elliott

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

Leo Feng

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Edward Hu

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernandez-Garcia

Salem Lahlou

Collaborating Alumni - Université de Montréal

Tabitha Edith Lee

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Zhen Liu

Collaborating Alumni - Université de Montréal

Principal supervisor :

Liam Paull

Kanika Madan

PhD - Université de Montréal

Nikolay Malkin

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Postdoctorate - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Nassim Rahaman

Collaborating Alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Collaborating researcher - Université de Montréal

Co-supervisor :

Loubna Benabbou

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

James Requeima

Independent visiting researcher - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Abhik Roychoudhury Roychoudhury

Independent visiting researcher

Principal supervisor :

Siva Reddy

Luca Scimeca

Postdoctorate - Université de Montréal

Collaborating Alumni - Université de Montréal

Marcin Sendera

Collaborating Alumni - Université de Montréal

Divya Sharma

Postdoctorate

Co-supervisor :

Alex Hernandez-Garcia

Mélisande Astrid Crystal Teng

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Independent visiting researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher

Collaborating researcher - Université de Montréal

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text

Tianyu Zhang

Suyuchen Wang

Li Li

Ge Zhang

Sai Rajeswar

2025-01-22

ICLR.cc/2025/Conference (poster)

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text

Ge Zhang

Sai Rajeswar

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (see more)texts using pixel-level hints within images through complex reasoning. This task stems from the observation that text embedded in images intrinsically differs from common visual elements and text due to the need to align the modalities of vision, text, and text embedded in images. While many works incorporate text into images for visual question answering, they mostly rely on OCR or masked language modeling, reducing the task to text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny, exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct VCR-WIKI for VCR using Wikipedia images with captions, including 2.11M English and 346K Chinese training entities, plus 5K validation and 5K test entities in both languages, each in easy and hard configurations. We also make a hidden test set, VCR-HIDDEN, to avoid potential overfitting on VCR-WIKI. Our results reveal that current vision-language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-WIKI and the data construction code to facilitate future research.

2025-01-22

ICLR.cc/2025/Conference (poster)

Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity

David Williams-King

Linh Le

Adam Oberman

As LLMs develop increasingly advanced capabilities, there is an increased need to minimize the harm that could be caused to society by certa… (see more)in model outputs; hence, most LLMs have safety guardrails added, for example via fine-tuning. In this paper, we argue the position that current safety fine-tuning is very similar to a traditional cat-and-mouse game (or arms race) between attackers and defenders in cybersecurity. Model jailbreaks and attacks are patched with bandaids to target the specific attack mechanism, but many similar attack vectors might remain. When defenders are not proactively coming up with principled mechanisms, it becomes very easy for attackers to sidestep any new defenses. We show how current defenses are insufficient to prevent new adversarial jailbreak attacks, reward hacking, and loss of control problems. In order to learn from past mistakes in cybersecurity, we draw analogies with historical examples and develop lessons learned that can be applied to LLM safety. These arguments support the need for new and more principled approaches to designing safe models, which are architected for security from the beginning. We describe several such approaches from the AI literature.

2025-01-19

ArXiv (preprint)

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

Diego Velazquez

Pau Rodriguez

Sergio Alonso

Josep M. Gonfaus

Jordi Gonzàlez 0001

Gerardo Richarte

Javier Marin

Alexandre Lacoste

This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhanc… (see more)e deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.

2025-01-14

ArXiv (preprint)

ICLR 2025 Workshop on Tackling Climate Change with Machine Learning: Data-Centric Approaches in ML for Climate Action

Konstantin Klemmer

Melissa Chapman

Lily Xu

Poon Kin Ho

Mélisande Teng

Patrick Emami

Climate change is one of the greatest problems society has ever faced, with increasingly severe consequences for humanity as natural disaste… (see more)rs multiply, sea levels rise, and ecosystems falter. While no silver bullet, machine learning can be an invaluable tool in fighting climate change via a wide array of applications and techniques, from designing smart electric grids to tracking greenhouse gas emissions through satellite imagery. These applications require algorithmic innovations in machine learning and close collaboration with diverse fields and practitioners. This workshop is intended as a forum for those in the global machine learning community who wish to help tackle climate change, and is further aimed to help foster cross-pollination between researchers in machine learning and experts in complementary climate-relevant fields. Building on our past workshops on this topic, this workshop particularly aims to explore data-centric ML approaches for climate action. Data-centric ML is not only a timely topic within the ICLR community, as analyzing and engineering (pre)training datasets becomes increasingly important, but holds specific challenges and opportunities in climate-related areas. We also want to take the opportunity of ICLR being hosted in Singapore to engage with local communities and shine a light on work that deploys, analyzes or critiques ML methods and their use for climate change adaptation and mitigation on the Asian continent.

2025-01-01

ICLR.cc/2025/Workshop_Proposals (published)

Integrating Generative and Experimental Platforms for Biomolecular Design

Cheng-Hao Liu

Jarrid Rector-Brooks

Soojung Yang

Sidney L Lisanza

Francesca-Zhoufan Li

Hannes Stärk

Jacob Gershon

Lauren Hong

Pranam Chatterjee

Tommi Jaakkola

Regina Barzilay

David Baker

Frances H. Arnold

Biomolecular design, through artificial engineering of proteins, ligands, and nucleic acids, holds immense promise in addressing pressing me… (see more)dical, industrial, and environmental challenges. While generative machine learning has shown significant potential in this area, a palpable disconnect exists with experimental biology: many ML research efforts prioritize static benchmark performance, potentially sidelining impactful biological applications. This workshop seeks to bridge this gap by bringing computationalists and experimentalists together, catalyzing a deeper interdisciplinary discourse. Together, we will explore the strengths and challenges of generative ML in biology, experimental integration of generative ML, and biological problems ready for ML. To attract high-quality and diverse research, we partnered with Nature Biotechnology for a special collection, and we created dedicated tracks for in-silico ML research and hybrid ML-experimental biology research. Our lineup features emerging leaders as speakers and renowned scientists as panelists, encapsulating a spectrum from high-throughput experimentation and computational biology to generative ML. With a diverse organizing team and backed by industry sponsors, we dedicate the workshop to pushing the boundaries of ML's role in biology.

2025-01-01

ICLR.cc/2025/Workshop_Proposals (published)

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction

Jarrid Rector-Brooks

Mohsin Hasan

Zhangzhi Peng

Zachary Quinn

Cheng-Hao Liu

Michael M. Bronstein

Pranam Chatterjee

2025-01-01

ICLR (published)

The Singapore Consensus on Global AI Safety Research Priorities

Tegan Maharaj

Luke Ong

Stuart Russell

Dawn Song

Max Tegmark

Lan Xue

Ya-Qin Zhang

Stephen Casper

Wan Sie Lee

Sören Mindermann

Vanessa Wilfred

Vidhisha Balachandran

Fazl Barez

Michael Belinsky

Imane Bello

Malo Bourgon

Mark Brakel

Sim'eon Campos

Duncan Cass-Beggs … (see 67 more)

Jiahao Chen

Rumman Chowdhury

Kuan Chua Seah

Jeff Clune

Juntao Dai

Agnès Delaborde

Nouha Dziri

Francisco Eiras

Joshua Engels

Jinyu Fan

Adam Gleave

Noah D. Goodman

Fynn Heide

Johannes Heidecke

Dan Hendrycks

Cyrus Hodes

Bryan Low Kian Hsiang

Minlie Huang

Sami Jawhar

Jingyu Wang

Adam Tauman Kalai

Meindert Kamphuis

Mohan S. Kankanhalli

Subhash Kantamneni

Mathias Bonde Kirk

Thomas Kwa

Jeffrey Ladish

Kwok-Yan Lam

Wan Lee Sie

Taewhi Lee

Xiaojian Li

Jiajun Liu

Chaochao Lu

Yifan Mai

Richard Mallah

Julian Michael

Nick Moës

Simon Möller

Kihyuk Nam

Kwan Yee Ng

Mark Nitzberg

Besmira Nushi

Sean O hEigeartaigh

Alejandro Ortega

Pierre Peigné

James Petrie

Benjamin Prud'homme

Reihaneh Rabbany

Nayat Sanchez-Pi

Sarah Schwettmann

Buck Shlegeris

Saad Siddiqui

Aradhana Sinha

Martín Soto

Cheston Tan

Dong Ting

William-Chandra Tjhi

Robert Trager

Brian Tse

H. AnthonyTungK.

John Willes

Denise Wong

W. Xu

Rongwu Xu

Yi Zeng

HongJiang Zhang

Djordje Zikelic

Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to en… (see more)sure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to support research in this space by bringing together AI scientists across geographies to identify and synthesise research priorities in AI safety. This resulting report builds on the International AI Safety Report chaired by Yoshua Bengio and backed by 33 governments. By adopting a defence-in-depth model, this report organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment), and challenges with monitoring and intervening after deployment (Control).

2025-01-01

arXiv.org (preprint)

The Singapore Consensus on Global AI Safety Research Priorities

Tegan Maharaj

Luke Ong

Stuart Russell

Dawn Song

Max Tegmark

Lan Xue

Ya-Qin Zhang

Stephen Casper

Wan Sie Lee

Sören Mindermann

Vanessa Wilfred

Vidhisha Balachandran

Fazl Barez

Michael Belinsky

Imane Bello

Malo Bourgon

Mark Brakel

Sim'eon Campos

Duncan Cass-Beggs … (see 67 more)

Jiahao Chen

Rumman Chowdhury

Kuan Chua Seah

Jeff Clune

Juntao Dai

Agnès Delaborde

Nouha Dziri

Francisco Eiras

Joshua Engels

Jinyu Fan

Adam Gleave

Noah D. Goodman

Fynn Heide

Johannes Heidecke

Dan Hendrycks

Cyrus Hodes

Bryan Low Kian Hsiang

Minlie Huang

Sami Jawhar

Jingyu Wang

Adam Tauman Kalai

Meindert Kamphuis

Mohan S. Kankanhalli

Subhash Kantamneni

Mathias Bonde Kirk

Thomas Kwa

Jeffrey Ladish

Kwok-Yan Lam

Wan Lee Sie

Taewhi Lee

Xiaojian Li

Jiajun Liu

Chaochao Lu

Yifan Mai

Richard Mallah

Julian Michael

Nick Moës

Simon Möller

Kihyuk Nam

Kwan Yee Ng

Mark Nitzberg

Besmira Nushi

Sean O hEigeartaigh

Alejandro Ortega

Pierre Peigné

James Petrie

Benjamin Prud'homme

Reihaneh Rabbany

Nayat Sanchez-Pi

Sarah Schwettmann

Buck Shlegeris

Saad Siddiqui

Aradhana Sinha

Martín Soto

Cheston Tan

Dong Ting

William Tjhi

Robert Trager

Brian Tse

H. AnthonyTungK.

John Willes

Denise Wong

Wei Xu

Rongwu Xu

Yi Zeng 0005

HongJiang Zhang

Djordje Zikelic

2025-01-01

arXiv.org (preprint)

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

Brian R. Bartoldson

Siddarth Venkatraman

James Diffenderfer

Moksh J. Jain

Tal Ben-Nun

Seanie Lee

Minsu Kim

Johan Samir Obando Ceron

Bhavya Kailkhura

2025-01-01

arXiv.org (preprint)

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

Brian R. Bartoldson

Siddarth Venkatraman

James Diffenderfer

Moksh J. Jain

Tal Ben-Nun

Seanie Lee

Minsu Kim

Johan Samir Obando Ceron

Bhavya Kailkhura

2025-01-01

arXiv.org (preprint)

A neuronal least-action principle for real-time learning in cortical circuits

Walter Senn

Dominik Dold

Akos F. Kungl

Benjamin Ellenberger

Jakob Jordan

João Sacramento

Mihai A. Petrovici

One of the most fundamental laws of physics is the principle of least action. Motivated by its predictive power, we introduce a neuronal lea… (see more)st-action principle for cortical processing of sensory streams to produce appropriate behavioural outputs in real time. The principle postulates that the voltage dynamics of cortical pyramidal neurons prospectively minimize the local somato-dendritic mismatch error within individual neurons. For motor output neurons, it implies minimizing an instantaneous behavioural error. For deep network neurons, it implies a prospective firing to overcome integration delays and correct for possible output errors right in time. The neuron-specific errors are extracted in the apical dendrites of pyramidal neurons through a cortical microcircuit that tries to explain away the feedback from the periphery, and correct the trajectory on the fly. Any motor output is in a moving equilibrium with the sensory inputs and the motor feedback during the whole sensory-motor trajectory. Ongoing synaptic plasticity reduces the somato-dendritic mismatch error within each cortical neuron and performs gradient descent on the output cost at any moment in time. The neuronal least-action principle offers an axiomatic framework to derive local neuronal and synaptic dynamics for global real-time computation and learning in the brain and in physical substrates in general.

2024-12-20

eLife (published)