Yoshua Bengio

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Julie Mongeau, adjointe de direction à julie.mongeau@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et directeur scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de directeur scientifique d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Stagiaire de recherche - McGill

Mohammed Abukalam

Stagiaire de recherche - UdeM

Rim Assouel

Doctorat - UdeM

Dan Assouline

Collaborateur·rice alumni

Ayoub Atanane

Stagiaire de recherche - Université du Québec à Rimouski

Stefan Bauer

Visiteur de recherche indépendant

Co-superviseur⋅e :

Guillaume Lajoie

Paul Bertin

Doctorat - UdeM

Ghait Boukachab

Stagiaire de recherche - UQAR

Doctorat - UdeM

Visiteur de recherche indépendant - MIT

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Chen Chen

Postdoctorat - UdeM

Co-superviseur⋅e :

Blake Richards

Xiaoyin Chen

Doctorat - UdeM

Pierre-Paul De Breuck

Collaborateur·rice alumni - UdeM

Doctorat - UdeM

Doctorat - UdeM

Collaborateur·rice de recherche - Université Paris-Saclay

Superviseur⋅e principal⋅e :

Eric Elmoznino

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - Massachusetts Institute of Technology

Léna Nehale Ezzine

Doctorat - UdeM

Jean-Pierre Falet

Doctorat - UdeM

Co-superviseur⋅e :

Leo Feng

Doctorat - UdeM

Stagiaire de recherche - Barcelona University

Piotr Gainski

Stagiaire de recherche - UdeM

Ivan Grega

Collaborateur·rice de recherche - UdeM

Pietro Greiner

Stagiaire de recherche

Mohsin Hasan

Doctorat - UdeM

mohsin.hasan@mila.quebec

Alex Hernandez-Garcia

Postdoctorat - UdeM

Co-superviseur⋅e :

Leon Hetzel

Visiteur de recherche indépendant - Technical University Munich (TUM)

Edward Hu

Doctorat - UdeM

Moksh Jain

Doctorat - UdeM

moksh.jain@mila.quebec

Stagiaire de recherche - UdeM

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Stagiaire de recherche - UdeM

Minsu Kim

Collaborateur·rice de recherche - UdeM

Doctorat - UdeM

Postdoctorat - UdeM

Doctorat - UdeM

Collaborateur·rice alumni

Seanie Lee

Collaborateur·rice alumni - UdeM

Zhen Liu

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Liam Paull

Chenghao Liu

Collaborateur·rice alumni

Stagiaire de recherche - Imperial College London

Doctorat - UdeM

Stagiaire de recherche - UdeM

Nikolay Malkin

Collaborateur·rice alumni - UdeM

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Postdoctorat - UdeM

Collaborateur·rice alumni

Sören Mindermann

Collaborateur·rice de recherche - UdeM

Sarthak Mittal

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant - UdeM

Ling Pan

Visiteur de recherche indépendant - Hong Kong University of Science and Technology (HKUST)

Ali Parviz

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Lena Podina

Doctorat - University of Waterloo

Superviseur⋅e principal⋅e :

Nassim Rahaman

Doctorat - Max-Planck-Institute for Intelligent Systems

Jarrid Rector-Brooks

Doctorat - UdeM

Co-superviseur⋅e :

Sarath Chandar

Danyal REHMAN

Postdoctorat - UdeM

James Requeima

Visiteur de recherche indépendant - UdeM

Postdoctorat - UdeM

Jessie Richter-Powell

Visiteur de recherche indépendant - UdeM

Camille Rochefort-Boulanger

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Julie Hussin

agassoussisalwane2@gmail.com

Salwane Salwane

Stagiaire de recherche - UdeM

Theo Saulus

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Postdoctorat - UdeM

Maîtrise recherche - UdeM

Marcin Sendera

Stagiaire de recherche - UdeM

Dounia Shaaban Kabakibo

Stagiaire de recherche - UdeM

Vedant Shah

Maîtrise recherche - UdeM

Collaborateur·rice alumni

Marco Stock

Visiteur de recherche indépendant - Technical University of Munich

marco.stock@tum.de

Anja Surina

Doctorat - École Polytechnique Fédérale de Lausanne

Vincent Taboga

Postdoctorat - Polytechnique

Co-superviseur⋅e :

Pierre-Luc Bacon

Mélisande Astrid Crystal Teng

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

alexander.tong@mila.quebec

Alex Tong

Postdoctorat - UdeM

Collaborateur·rice de recherche - Valence

Superviseur⋅e principal⋅e :

Dominique Beaini

Donna Vakalis

Postdoctorat - UdeM

Co-superviseur⋅e :

Viktor Viktor Todosijevic

Collaborateur·rice de recherche - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)

Superviseur⋅e principal⋅e :

Sasha Volokhova

Doctorat - UdeM

Zichao Yan

Collaborateur·rice alumni - UdeM

Kyle YUN

Collaborateur·rice de recherche - KAIST

Elmimouni Zakaria

Stagiaire de recherche - UdeM

Nicole Zhang

Doctorat - McGill

Superviseur⋅e principal⋅e :

Mathieu Blanchette

Dinghuai Zhang

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Aaron Courville

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Ruixiang Zhang

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Harry Zhao

Doctorat - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

Scaling in the service of reasoning & model-based ML

4 avril 2023

par

Yoshua Bengio

Edward J. Hu

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

par

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

Les réseaux de flot génératifs

15 mars 2022

par

Yoshua Bengio

Publications

AI-Assisted Generation of Difficult Math Questions

Vedant Shah

Dingli Yu

Kaifeng Lyu

Simon Park

Nan Rosemary Ke

Michael Curtis Mozer

James Lloyd McClelland

Sanjeev Arora

Anirudh Goyal

Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet dem… (voir plus)and for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core"skills"from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an"out of distribution"task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH

2024-10-09

NeurIPS.cc/2024/Workshop/MATH-AI (accepté)

openreview.net

VCR: Visual Caption Restoration

Tianyu Zhang

Suyuchen Wang

Lu Li

Ge Zhang

Perouz Taslakian

Sai Rajeswar

Jie Fu

Bang Liu

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (voir plus)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.

2024-10-09

NeurIPS.cc/2024/Workshop/Sys2-Reasoning (poster)

openreview.net

Adaptive teachers for amortized samplers

Minsu Kim

Sanghyeok Choi

Taeyoung Yun

Emmanuel Bengio

Leo Feng

Jarrid Rector-Brooks

Sungsoo Ahn

Jinkyoo Park

Nikolay Malkin

Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnorma… (voir plus)lized density where exact sampling is intractable. When sampling is implemented as a sequential decision-making process, reinforcement learning (RL) methods, such as generative flow networks, can be used to train the sampling policy. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose to use an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions. The Teacher, an auxiliary behavior model, is trained to sample high-error regions of the Student and can generalize across unexplored modes, thereby enhancing mode coverage by providing an efficient training curriculum. We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge, two diffusion-based sampling tasks, and four biochemical discovery tasks demonstrating its ability to improve sample efficiency and mode coverage.

2024-10-02

ArXiv (prépublication)

Geometric Signatures of Compositionality Across a Language Model's Lifetime

Jin Hwa Lee

Thomas Jiralerspong

Lei Yu

Emily Cheng

Compositionality, the notion that the meaning of an expression is constructed from the meaning of its parts and syntactic rules, permits the… (voir plus) infinite productivity of human language. For the first time, artificial language models (LMs) are able to match human performance in a number of compositional generalization tasks. However, much remains to be understood about the representational mechanisms underlying these abilities. We take a high-level geometric approach to this problem by relating the degree of compositionality in a dataset to the intrinsic dimensionality of its representations under an LM, a measure of feature complexity. We find not only that the degree of dataset compositionality is reflected in representations' intrinsic dimensionality, but that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training. Finally, our analyses reveal a striking contrast between linear and nonlinear dimensionality, showing that they respectively encode formal and semantic aspects of linguistic composition.

2024-10-02

ArXiv (prépublication)

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee

Haebin Seong

Dong Bok Lee

Minki Kang

Xiaoyin Chen

Dominik Wagner

Juho Lee

Sung Ju Hwang

Safety guard models that detect malicious queries aimed at large language models (LLMs) are essential for ensuring the secure and responsibl… (voir plus)e deployment of LLMs in real-world applications. However, deploying existing safety guard models with billions of parameters alongside LLMs on mobile devices is impractical due to substantial memory requirements and latency. To reduce this cost, we distill a large teacher safety guard model into a smaller one using a labeled dataset of instruction-response pairs with binary harmfulness labels. Due to the limited diversity of harmful instructions in the existing labeled dataset, naively distilled models tend to underperform compared to larger models. To bridge the gap between small and large models, we propose HarmAug, a simple yet effective data augmentation method that involves jailbreaking an LLM and prompting it to generate harmful instructions. Given a prompt such as,"Make a single harmful instruction prompt that would elicit offensive content", we add an affirmative prefix (e.g.,"I have an idea for a prompt:") to the LLM's response. This encourages the LLM to continue generating the rest of the response, leading to sampling harmful instructions. Another LLM generates a response to the harmful instruction, and the teacher model labels the instruction-response pair. We empirically show that our HarmAug outperforms other relevant baselines. Moreover, a 435-million-parameter safety guard model trained with HarmAug achieves an F1 score comparable to larger models with over 7 billion parameters, and even outperforms them in AUPRC, while operating at less than 25% of their computational cost.

2024-10-02

ArXiv (prépublication)

Were RNNs All We Needed?

Leo Feng

Frederick Tung

Mohamed Osama Ahmed

Hossein Hajimirsadegh

2024-10-02

ArXiv (prépublication)

Laurence Perreault-Levasseur

A Data-driven Discovery of the Causal Connection between Galaxy and Black Hole Evolution

Zehao Jin

Mario Pasquato

Benjamin L. Davis

Tristan Deleu

Yu Luo

Changhyun Cho

Pablo Lemos

Xi Kang

A. Macciò

Yashar Hezaveh

2024-10-01

ArXiv (prépublication)

A neuronal least-action principle for real-time learning in cortical circuits

Walter Senn

Dominik Dold

Akos F. Kungl

Benjamin Ellenberger

Jakob Jordan

João Sacramento

Mihai A. Petrovici

One of the most fundamental laws of physics is the principle of least action. Motivated by its predictive power, we introduce a neuronal lea… (voir plus)st-action principle for cortical processing of sensory streams to produce appropriate behavioural outputs in real time. The principle postulates that the voltage dynamics of cortical pyramidal neurons prospectively minimize the local somato-dendritic mismatch error within individual neurons. For motor output neurons, it implies minimizing an instantaneous behavioural error. For deep network neurons, it implies a prospective firing to overcome integration delays and correct for possible output errors right in time. The neuron-specific errors are extracted in the apical dendrites of pyramidal neurons through a cortical microcircuit that tries to explain away the feedback from the periphery, and correct the trajectory on the fly. Any motor output is in a moving equilibrium with the sensory inputs and the motor feedback during the whole sensory-motor trajectory. Ongoing synaptic plasticity reduces the somato-dendritic mismatch error within each cortical neuron and performs gradient descent on the output cost at any moment in time. The neuronal least-action principle offers an axiomatic framework to derive local neuronal and synaptic dynamics for global real-time computation and learning in the brain and in physical substrates in general.

2024-09-23

bioRxiv (prépublication)

AI content detection in the emerging information ecosystem: new obligations for media and tech companies

Alistair Knott

Dino Pedreschi

Toshiya Jitsuzumi

Susan Leavy

D. Eyers

Tapabrata Chakraborti

Andrew Trotman

Sundar Sundareswaran

Ricardo Baeza-Yates

Przemyslaw Biecek

Adrian Weller

Paul D. Teal

Subhadip Basu

Mehmet Haklidir

Virginia Morini

Stuart Russell

2024-09-21

Ethics and Information Technology (publié)

A high-throughput phenotypic screen combined with an ultra-large-scale deep learning-based virtual screening reveals novel scaffolds of antibacterial compounds

Gabriele Scalia

Steven T. Rutherford

Ziqing Lu

Kerry R. Buchholz

Nicholas Skelton

Kangway Chuang

Nathaniel Diamant

Jan-Christian Hütter

Jerome-Maxim Luescher

Anh Miu

Jeff Blaney

Leo Gendelev

Elizabeth Skippington

Greg Zynda

Nia Dickson

Michał Koziarski

Aviv Regev

Man-Wah Tan

Tommaso Biancalani

2024-09-14

bioRxiv (prépublication)

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Usman Anwar

Abulhair Saparov

Javier Rando

Daniel Paleka

Miles Turpin

Peter Hase

Ekdeep Singh Lubana

Erik Jenner

Stephen Casper

Oliver Sourbut

Benjamin L. Edelman

Zhaowei Zhang

Mario Günther

Anton Korinek

Jose Hernandez-Orallo

Lewis Hammond

Eric J Bigelow

Alexander Pan

Lauro Langosco

Tomasz Korbak … (voir 22 de plus)

Heidi Chenyu Zhang

Ruiqi Zhong

Sean O hEigeartaigh

Gabriel Recchia

Giulio Corsi

Alan Chan

Markus Anderljung

Lilian Edwards

Aleksandar Petrov

Christian Schroeder de Witt

Danqi Chen

Sumeet Ramesh Motwani

Samuel Albanie

Tegan Maharaj

Jakob Nicolaus Foerster

Philip Torr

Florian Tramèr

He He

Atoosa Kasirzadeh

Yejin Choi

David Scott Krueger

2024-09-02

TMLR (accepté)

openreview.net

Zero-Shot Object-Centric Representation Learning

Aniket Rajiv Didolkar

Andrii Zadaianchuk

Anirudh Goyal

Michael Curtis Mozer

Georg Martius

Maximilian Seitzer

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (voir plus). Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

2024-08-17

ArXiv (prépublication)