Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Research Intern - McGill University

Mohammed Abukalam

Research Intern - Université de Montréal

Rim Assouel

PhD - Université de Montréal

Collaborating Alumni

Research Intern - Université du Québec à Rimouski

Stefan Bauer

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Paul Bertin

PhD - Université de Montréal

Ghait Boukachab

Research Intern - UQAR

Oussama Boussif

PhD - Université de Montréal

Independent visiting researcher - MIT

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

Chen Chen

Postdoctorate - Université de Montréal

Co-supervisor :

Blake Richards

Xiaoyin Chen

PhD - Université de Montréal

Pierre-Paul De Breuck

Collaborating Alumni - Université de Montréal

PhD - Université de Montréal

PhD - Université de Montréal

Collaborating researcher - Université Paris-Saclay

Principal supervisor :

Eric Elmoznino

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Katie Everett

PhD - Massachusetts Institute of Technology

Léna Nehale Ezzine

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

Co-supervisor :

Leo Feng

PhD - Université de Montréal

Research Intern - Barcelona University

Piotr Gainski

Research Intern - Université de Montréal

Ivan Grega

Collaborating researcher - Université de Montréal

Pietro Greiner

Research Intern

Mohsin Hasan

PhD - Université de Montréal

mohsin.hasan@mila.quebec

Alex Hernandez-Garcia

Postdoctorate - Université de Montréal

Co-supervisor :

Leon Hetzel

Independent visiting researcher - Technical University Munich (TUM)

Edward Hu

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

moksh.jain@mila.quebec

Research Intern - Université de Montréal

Master's Research - Université de Montréal

Co-supervisor :

Research Intern - Université de Montréal

Minsu Kim

Collaborating researcher - Université de Montréal

PhD - Université de Montréal

Michał Koziarski

Postdoctorate - Université de Montréal

Salem Lahlou

PhD - Université de Montréal

Hae-Beom Lee

Collaborating Alumni

Seanie Lee

Collaborating Alumni - Université de Montréal

Collaborating Alumni

Zhen Liu

PhD - Université de Montréal

Principal supervisor :

Liam Paull

Matt MacDermott

Research Intern - Imperial College London

PhD - Université de Montréal

Mohammed Mahfoud

Research Intern - Université de Montréal

Nikolay Malkin

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Stefano Massaroli

Postdoctorate - Université de Montréal

Collaborating Alumni

Collaborating researcher - Université de Montréal

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Postdoctorate - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Ling Pan

Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

PhD - University of Waterloo

Principal supervisor :

Nassim Rahaman

PhD - Max-Planck-Institute for Intelligent Systems

Jarrid Rector-Brooks

PhD - Université de Montréal

Co-supervisor :

Sarath Chandar

Danyal REHMAN

Postdoctorate - Université de Montréal

James Requeima

Independent visiting researcher - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Jessie Richter-Powell

Independent visiting researcher - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

agassoussisalwane2@gmail.com

Salwane Salwane

Research Intern - Université de Montréal

Theo Saulus

Collaborating researcher

Principal supervisor :

Victor Schmidt

PhD - Université de Montréal

Postdoctorate - Université de Montréal

Master's Research - Université de Montréal

Marcin Sendera

Research Intern - Université de Montréal

Dounia Shaaban Kabakibo

Research Intern - Université de Montréal

Vedant Shah

Master's Research - Université de Montréal

Collaborating Alumni

Marco Stock

Independent visiting researcher - Technical University of Munich

marco.stock@tum.de

Anja Surina

PhD - École Polytechnique Montréal Fédérale de Lausanne

Vincent Taboga

Postdoctorate - Polytechnique Montréal

Co-supervisor :

Pierre-Luc Bacon

Mélisande Astrid Crystal Teng

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher

Principal supervisor :

alexander.tong@mila.quebec

Alex Tong

Postdoctorate - Université de Montréal

Collaborating researcher - Valence

Principal supervisor :

Dominique Beaini

Donna Vakalis

Postdoctorate - Université de Montréal

Co-supervisor :

Viktor Viktor Todosijevic

Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)

Principal supervisor :

Sasha Volokhova

PhD - Université de Montréal

Zichao Yan

Collaborating Alumni - Université de Montréal

Kyle YUN

Collaborating researcher - KAIST

Elmimouni Zakaria

Research Intern - Université de Montréal

Nicole Zhang

PhD - McGill University

Principal supervisor :

Mathieu Blanchette

Dinghuai Zhang

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Ruixiang Zhang

PhD - Université de Montréal

Principal supervisor :

Liam Paull

Tianyu Zhang

PhD - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

PhD - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

AI-Assisted Generation of Difficult Math Questions

Vedant Shah

Dingli Yu

Kaifeng Lyu

Simon Park

Nan Rosemary Ke

Michael Curtis Mozer

James Lloyd McClelland

Sanjeev Arora

Anirudh Goyal

Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet dem… (see more)and for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core"skills"from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an"out of distribution"task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH

2024-10-09

NeurIPS.cc/2024/Workshop/MATH-AI (accepted)

openreview.net

VCR: Visual Caption Restoration

Tianyu Zhang

Suyuchen Wang

Lu Li

Ge Zhang

Perouz Taslakian

Sai Rajeswar

Jie Fu

Bang Liu

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (see more)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.

2024-10-09

NeurIPS.cc/2024/Workshop/Sys2-Reasoning (poster)

openreview.net

Adaptive teachers for amortized samplers

Minsu Kim

Sanghyeok Choi

Taeyoung Yun

Emmanuel Bengio

Leo Feng

Jarrid Rector-Brooks

Sungsoo Ahn

Jinkyoo Park

Nikolay Malkin

Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnorma… (see more)lized density where exact sampling is intractable. When sampling is implemented as a sequential decision-making process, reinforcement learning (RL) methods, such as generative flow networks, can be used to train the sampling policy. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose to use an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions. The Teacher, an auxiliary behavior model, is trained to sample high-error regions of the Student and can generalize across unexplored modes, thereby enhancing mode coverage by providing an efficient training curriculum. We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge, two diffusion-based sampling tasks, and four biochemical discovery tasks demonstrating its ability to improve sample efficiency and mode coverage.

2024-10-02

ArXiv (preprint)

Geometric Signatures of Compositionality Across a Language Model's Lifetime

Jin Hwa Lee

Thomas Jiralerspong

Lei Yu

Emily Cheng

Compositionality, the notion that the meaning of an expression is constructed from the meaning of its parts and syntactic rules, permits the… (see more) infinite productivity of human language. For the first time, artificial language models (LMs) are able to match human performance in a number of compositional generalization tasks. However, much remains to be understood about the representational mechanisms underlying these abilities. We take a high-level geometric approach to this problem by relating the degree of compositionality in a dataset to the intrinsic dimensionality of its representations under an LM, a measure of feature complexity. We find not only that the degree of dataset compositionality is reflected in representations' intrinsic dimensionality, but that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training. Finally, our analyses reveal a striking contrast between linear and nonlinear dimensionality, showing that they respectively encode formal and semantic aspects of linguistic composition.

2024-10-02

ArXiv (preprint)

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee

Haebin Seong

Dong Bok Lee

Minki Kang

Xiaoyin Chen

Dominik Wagner

Juho Lee

Sung Ju Hwang

Safety guard models that detect malicious queries aimed at large language models (LLMs) are essential for ensuring the secure and responsibl… (see more)e deployment of LLMs in real-world applications. However, deploying existing safety guard models with billions of parameters alongside LLMs on mobile devices is impractical due to substantial memory requirements and latency. To reduce this cost, we distill a large teacher safety guard model into a smaller one using a labeled dataset of instruction-response pairs with binary harmfulness labels. Due to the limited diversity of harmful instructions in the existing labeled dataset, naively distilled models tend to underperform compared to larger models. To bridge the gap between small and large models, we propose HarmAug, a simple yet effective data augmentation method that involves jailbreaking an LLM and prompting it to generate harmful instructions. Given a prompt such as,"Make a single harmful instruction prompt that would elicit offensive content", we add an affirmative prefix (e.g.,"I have an idea for a prompt:") to the LLM's response. This encourages the LLM to continue generating the rest of the response, leading to sampling harmful instructions. Another LLM generates a response to the harmful instruction, and the teacher model labels the instruction-response pair. We empirically show that our HarmAug outperforms other relevant baselines. Moreover, a 435-million-parameter safety guard model trained with HarmAug achieves an F1 score comparable to larger models with over 7 billion parameters, and even outperforms them in AUPRC, while operating at less than 25% of their computational cost.

2024-10-02

ArXiv (preprint)

Were RNNs All We Needed?

Leo Feng

Frederick Tung

Mohamed Osama Ahmed

Hossein Hajimirsadegh

2024-10-02

ArXiv (preprint)

Laurence Perreault-Levasseur

A Data-driven Discovery of the Causal Connection between Galaxy and Black Hole Evolution

Zehao Jin

Mario Pasquato

Benjamin L. Davis

Tristan Deleu

Yu Luo

Changhyun Cho

Pablo Lemos

Xi Kang

A. Macciò

Yashar Hezaveh

2024-10-01

ArXiv (preprint)

A neuronal least-action principle for real-time learning in cortical circuits

Walter Senn

Dominik Dold

Akos F. Kungl

Benjamin Ellenberger

Jakob Jordan

João Sacramento

Mihai A. Petrovici

One of the most fundamental laws of physics is the principle of least action. Motivated by its predictive power, we introduce a neuronal lea… (see more)st-action principle for cortical processing of sensory streams to produce appropriate behavioural outputs in real time. The principle postulates that the voltage dynamics of cortical pyramidal neurons prospectively minimize the local somato-dendritic mismatch error within individual neurons. For motor output neurons, it implies minimizing an instantaneous behavioural error. For deep network neurons, it implies a prospective firing to overcome integration delays and correct for possible output errors right in time. The neuron-specific errors are extracted in the apical dendrites of pyramidal neurons through a cortical microcircuit that tries to explain away the feedback from the periphery, and correct the trajectory on the fly. Any motor output is in a moving equilibrium with the sensory inputs and the motor feedback during the whole sensory-motor trajectory. Ongoing synaptic plasticity reduces the somato-dendritic mismatch error within each cortical neuron and performs gradient descent on the output cost at any moment in time. The neuronal least-action principle offers an axiomatic framework to derive local neuronal and synaptic dynamics for global real-time computation and learning in the brain and in physical substrates in general.

2024-09-23

bioRxiv (preprint)

AI content detection in the emerging information ecosystem: new obligations for media and tech companies

Alistair Knott

Dino Pedreschi

Toshiya Jitsuzumi

Susan Leavy

D. Eyers

Tapabrata Chakraborti

Andrew Trotman

Sundar Sundareswaran

Ricardo Baeza-Yates

Przemyslaw Biecek

Adrian Weller

Paul D. Teal

Subhadip Basu

Mehmet Haklidir

Virginia Morini

Stuart Russell

2024-09-21

Ethics and Information Technology (published)

A high-throughput phenotypic screen combined with an ultra-large-scale deep learning-based virtual screening reveals novel scaffolds of antibacterial compounds

Gabriele Scalia

Steven T. Rutherford

Ziqing Lu

Kerry R. Buchholz

Nicholas Skelton

Kangway Chuang

Nathaniel Diamant

Jan-Christian Hütter

Jerome-Maxim Luescher

Anh Miu

Jeff Blaney

Leo Gendelev

Elizabeth Skippington

Greg Zynda

Nia Dickson

Michał Koziarski

Aviv Regev

Man-Wah Tan

Tommaso Biancalani

2024-09-14

bioRxiv (preprint)

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Usman Anwar

Abulhair Saparov

Javier Rando

Daniel Paleka

Miles Turpin

Peter Hase

Ekdeep Singh Lubana

Erik Jenner

Stephen Casper

Oliver Sourbut

Benjamin L. Edelman

Zhaowei Zhang

Mario Günther

Anton Korinek

Jose Hernandez-Orallo

Lewis Hammond

Eric J Bigelow

Alexander Pan

Lauro Langosco

Tomasz Korbak … (see 22 more)

Heidi Chenyu Zhang

Ruiqi Zhong

Sean O hEigeartaigh

Gabriel Recchia

Giulio Corsi

Alan Chan

Markus Anderljung

Lilian Edwards

Aleksandar Petrov

Christian Schroeder de Witt

Danqi Chen

Sumeet Ramesh Motwani

Samuel Albanie

Tegan Maharaj

Jakob Nicolaus Foerster

Philip Torr

Florian Tramèr

He He

Atoosa Kasirzadeh

Yejin Choi

David Scott Krueger

2024-09-02

TMLR (accepted)

openreview.net

Zero-Shot Object-Centric Representation Learning

Aniket Rajiv Didolkar

Andrii Zadaianchuk

Anirudh Goyal

Michael Curtis Mozer

Georg Martius

Maximilian Seitzer

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (see more). Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

2024-08-17

ArXiv (preprint)