Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Stefan Bauer

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

David Rolnick

Xiaoyin Chen

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

Loubna Benabbou

Desmond Elliott

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernandez-Garcia

Tabitha Edith Lee

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Nassim Rahaman

Collaborating Alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Collaborating researcher - Université de Montréal

Co-supervisor :

Loubna Benabbou

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Dragos Secrieru

Collaborating Alumni - Université de Montréal

Divya Sharma

Postdoctorate

Co-supervisor :

Alex Hernandez-Garcia

Mélisande Astrid Crystal Teng

Vincent Taboga

Collaborating Alumni - Polytechnique Montréal

Co-supervisor :

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Collaborating researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher

Collaborating researcher - Université de Montréal

Nicole Zhang

PhD - McGill University

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Tianyu Zhang

PhD - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

A neuronal least-action principle for real-time learning in cortical circuits

Walter Senn

Dominik Dold

Akos F. Kungl

Benjamin Ellenberger

Jakob Jordan

João Sacramento

Mihai A. Petrovici

One of the most fundamental laws of physics is the principle of least action. Motivated by its predictive power, we introduce a neuronal lea… (see more)st-action principle for cortical processing of sensory streams to produce appropriate behavioural outputs in real time. The principle postulates that the voltage dynamics of cortical pyramidal neurons prospectively minimize the local somato-dendritic mismatch error within individual neurons. For motor output neurons, it implies minimizing an instantaneous behavioural error. For deep network neurons, it implies a prospective firing to overcome integration delays and correct for possible output errors right in time. The neuron-specific errors are extracted in the apical dendrites of pyramidal neurons through a cortical microcircuit that tries to explain away the feedback from the periphery, and correct the trajectory on the fly. Any motor output is in a moving equilibrium with the sensory inputs and the motor feedback during the whole sensory-motor trajectory. Ongoing synaptic plasticity reduces the somato-dendritic mismatch error within each cortical neuron and performs gradient descent on the output cost at any moment in time. The neuronal least-action principle offers an axiomatic framework to derive local neuronal and synaptic dynamics for global real-time computation and learning in the brain and in physical substrates in general.

2024-12-19

eLife (published)

Imagining and building wise machines: The centrality of AI metacognition

Samuel G. B. Johnson

Amir-Hossein Karimi

Nick Chater

Tobias Gerstenberg

Kate Larson

Sydney Levine

Melanie Mitchell

Iyad Rahwan

Bernhard Schölkopf

Igor Grossmann

2024-10-31

arXiv (published)

arxiv.org

Trajectory Flow Matching with Applications to Clinical Time Series Modeling

Xi Zhang

Yuan Pu

Yuki Kawamura

Andrew Loza

Dennis L. Shung

Alexander Tong

2024-10-27

ArXiv (preprint)

Correction: Al content detection in the emerging information ecosystem: new obligations for media and tech companies

Alistair Knott

Dino Pedreschi

Toshiya Jitsuzumi

Susan Leavy

David Eyers

Tapabrata Chakraborti

Andrew Trotman

Sundar Sundareswaran

Ricardo Baeza-Yates

Przemyslaw Biecek

Adrian Weller

Paul D. Teal

Subhadip Basu

Mehmet Haklidir

Virginia Morini

Stuart Russell

2024-10-21

Ethics and Information Technology (published)

A Complexity-Based Theory of Compositionality

2024-10-17

ArXiv (preprint)

arxiv.org

Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity

David Williams-King

Linh Le

Adam Oberman

As LLMs develop increasingly advanced capabilities, there is an increased need to minimize the harm that could be caused to society by certa… (see more)in model outputs; hence, most LLMs have safety guardrails added, for example via fine-tuning. In this paper, we argue the position that current safety fine-tuning is very similar to a traditional cat-and-mouse game (or arms race) between attackers and defenders in cybersecurity. Model jailbreaks and attacks are patched with bandaids to target the specific attack mechanism, but many similar attack vectors might remain. When defenders are not proactively coming up with principled mechanisms, it becomes very easy for attackers to sidestep any new defenses. We show how current defenses are insufficient to prevent new adversarial jailbreak attacks, reward hacking, and loss of control problems. In order to learn from past mistakes in cybersecurity, we draw analogies with historical examples and develop lessons learned that can be applied to LLM safety. These arguments support the need for new and more principled approaches to designing safe models, which are architected for security from the beginning. We describe several such approaches from the AI literature.

2024-10-11

NeurIPS.cc/2024/Workshop/SafeGenAi (poster)

Rejecting Hallucinated State Targets during Planning

Mingde “Harry” Zhao

Tristan Sylvain

Romain Laroche

Doina Precup

In planning processes of computational decision-making agents, generative or predictive models are often used as "generators" to propose "ta… (see more)rgets" representing sets of expected or desirable states. Unfortunately, learned models inevitably hallucinate infeasible targets that can cause delusional behaviors and safety concerns. We first investigate the kinds of infeasible targets that generators can hallucinate. Then, we devise a strategy to identify and reject infeasible targets by learning a target feasibility evaluator. To ensure that the evaluator is robust and non-delusional, we adopted a design choice combining off-policy compatible learning rule, distributional architecture, and data augmentation based on hindsight relabeling. Attaching to a planning agent, the designed evaluator learns by observing the agent's interactions with the environment and the targets produced by its generator, without the need to change the agent or its generator. Our controlled experiments show significant reductions in delusional behaviors and performance improvements for various kinds of existing agents.

2024-10-11

NeurIPS.cc/2024/Workshop/SafeGenAi (poster)

proceedings.mlr.press

Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases

Cristian Meo

Akihiro Nakano

Mircea Tudor Lică

Aniket Rajiv Didolkar

Masahiro Suzuki

Anirudh Goyal

Mengmi Zhang

Justin Dauwels

Yutaka Matsuo

2024-10-09

NeurIPS.cc/2024/Workshop/Compositional_Learning (poster)

AI-Assisted Generation of Difficult Math Questions

Vedant Shah

Dingli Yu

Kaifeng Lyu

Simon Park

Nan Rosemary Ke

Jiatong Yu

Yinghui He

Michael Curtis Mozer

James Lloyd McClelland

Sanjeev Arora

Anirudh Goyal

Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet dem… (see more)and for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core"skills"from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an"out of distribution"task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH

2024-10-08

NeurIPS.cc/2024/Workshop/MATH-AI (accepted)

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text

Ge Zhang

Sai Rajeswar

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (see more)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.

2024-10-08

NeurIPS.cc/2024/Workshop/Sys2-Reasoning (poster)

Path-filtering in path-integral simulations of open quantum systems using GFlowNets

Jeremy Lackman-Mincoff

An important class of methods for modeling dynamics in open quantum systems is based on the well-known influence functional (IF) approach to… (see more) solving path-integral equations of motion. Within this paradigm, path-filtering schemes based on the removal of IF elements that fall below a certain threshold aim to reduce the effort needed to calculate and store the influence functional, making very challenging simulations possible. A filtering protocol of this type is considered acceptable as long as the simulation remains mathematically stable. This, however, does not guarantee that the approximated dynamics preserve the physics of the simulated process. In this paper, we explore the possibility of training Generative Flow Networks (GFlowNets) to produce filtering protocols while optimizing for mathematical stability and for physical accuracy. Trained using the trajectory balance objective, the model produces sets of paths to be added to a truncated initial set; it is rewarded if the combined set of paths gives rise to solutions in which the trace of the density matrix is conserved, the populations remain real, and the dynamics approach the exact reference. Using a simple two-level system coupled to a dissipative reservoir, we perform proof-of-concept simulations and demonstrate the elegant and surprising filtering solutions proposed by the GFlowNet.

2024-10-07

Journal of Chemical Physics (published)

Were RNNs All We Needed?

Leo Feng

Frederick Tung

Mohamed Osama Ahmed

Hossein Hajimirsadeghi

The introduction of Transformers in 2017 reshaped the landscape of deep learning. Originally proposed for sequence modelling, Transformers h… (see more)ave since achieved widespread success across various domains. However, the scalability limitations of Transformers - particularly with respect to sequence length - have sparked renewed interest in novel recurrent models that are parallelizable during training, offer comparable performance, and scale more effectively. In this work, we revisit sequence modelling from a historical perspective, focusing on Recurrent Neural Networks (RNNs), which dominated the field for two decades before the rise of Transformers. Specifically, we examine LSTMs (1997) and GRUs (2014). We demonstrate that by simplifying these models, we can derive minimal versions (minLSTMs and minGRUs) that (1) use fewer parameters than their traditional counterparts, (2) are fully parallelizable during training, and (3) achieve surprisingly competitive performance on a range of tasks, rivalling recent models including Transformers.

2024-10-01

ArXiv (preprint)