Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle
Directeur scientifique, Équipe de direction
Observateur, Conseil d'administration, Mila
Sujets de recherche
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Causalité
Modèles génératifs
Modèles probabilistes
Modélisation moléculaire
Neurosciences computationnelles
Raisonnement
Réseaux de neurones en graphes
Réseaux de neurones récurrents
Théorie de l'apprentissage automatique
Traitement du langage naturel

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Julie Mongeau, adjointe de direction à julie.mongeau@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et directeur scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de directeur scientifique d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Collaborateur·rice de recherche
Stagiaire de recherche - McGill
Stagiaire de recherche - UdeM
Doctorat - UdeM
Collaborateur·rice alumni
Stagiaire de recherche - Université du Québec à Rimouski
Visiteur de recherche indépendant
Co-superviseur⋅e :
Visiteur de recherche indépendant - UQAR
Doctorat - UdeM
Stagiaire de recherche - UQAR
Visiteur de recherche indépendant - MIT
Doctorat - UdeM
Postdoctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice alumni - UdeM
Collaborateur·rice de recherche - Université Paris-Saclay
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Doctorat - Massachusetts Institute of Technology
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - Barcelona University
Stagiaire de recherche - UdeM
Collaborateur·rice de recherche - UdeM
Collaborateur·rice de recherche
Postdoctorat - UdeM
Co-superviseur⋅e :
Visiteur de recherche indépendant - Technical University Munich (TUM)
Doctorat - UdeM
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Stagiaire de recherche - UdeM
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Postdoctorat - UdeM
Doctorat - UdeM
Collaborateur·rice alumni
Collaborateur·rice alumni - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni
Stagiaire de recherche - Imperial College London
Doctorat - UdeM
Stagiaire de recherche - UdeM
Collaborateur·rice alumni - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Collaborateur·rice alumni
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - UdeM
Visiteur de recherche indépendant - Hong Kong University of Science and Technology (HKUST)
Collaborateur·rice de recherche - Ying Wu Coll of Computing
Doctorat - University of Waterloo
Superviseur⋅e principal⋅e :
Doctorat - Max-Planck-Institute for Intelligent Systems
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Visiteur de recherche indépendant - UdeM
Visiteur de recherche indépendant - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Stagiaire de recherche - UdeM
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Stagiaire de recherche - UdeM
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Collaborateur·rice alumni
Visiteur de recherche indépendant - Technical University of Munich
Doctorat - École Polytechnique Fédérale de Lausanne
Postdoctorat - Polytechnique
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Valence
Superviseur⋅e principal⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Collaborateur·rice alumni - UdeM
Stagiaire de recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :

Publications

Can a Bayesian Oracle Prevent Harm from an Agent?
Michael K. Cohen
Nikolay Malkin
Matt MacDermott
Damiano Fornasiere
Pietro Greiner
Younesse Kaddar
Is there a way to design powerful AI systems based on machine learning methods that would satisfy probabilistic safety guarantees? With the … (voir plus)long-term goal of obtaining a probabilistic guarantee that would apply in every context, we consider estimating a context-dependent bound on the probability of violating a given safety specification. Such a risk evaluation would need to be performed at run-time to provide a guardrail against dangerous actions of an AI. Noting that different plausible hypotheses about the world could produce very different outcomes, and because we do not know which one is right, we derive bounds on the safety violation probability predicted under the true but unknown hypothesis. Such bounds could be used to reject potentially dangerous actions. Our main results involve searching for cautious but plausible hypotheses, obtained by a maximization that involves Bayesian posteriors over hypotheses. We consider two forms of this result, in the iid case and in the non-iid case, and conclude with open problems towards turning such theoretical results into practical AI guardrails.
Open Problems in Technical AI Governance
Anka Reuel
Benjamin Bucknall
Stephen Casper
Tim Fist
Lisa Soder
Onni Aarne
Lewis Hammond
Lujain Ibrahim
Alan Chan
Peter Wills
Markus Anderljung
Ben Garfinkel
Lennart Heim
Andrew Trask
Gabriel Mukobi
Rylan Schaeffer
Mauricio Baker
Sara Hooker
Irene Solaiman
Alexandra Luccioni … (voir 11 de plus)
Nitarshan Rajkumar
Nicolas Moes
Jeffrey Ladish
Neel Guha
Jessica Newman
Tobin South
Alex Pentland
Sanmi Koyejo
Mykel J. Kochenderfer
Robert F. Trager
AI progress is creating a growing range of risks and opportunities, but it is often unclear how they should be navigated. In many cases, the… (voir plus) barriers and uncertainties faced are at least partly technical. Technical AI governance, referring to technical analysis and tools for supporting the effective governance of AI, seeks to address such challenges. It can help to (a) identify areas where intervention is needed, (b) identify and assess the efficacy of potential governance actions, and (c) enhance governance options by designing mechanisms for enforcement, incentivization, or compliance. In this paper, we explain what technical AI governance is, why it is important, and present a taxonomy and incomplete catalog of its open problems. This paper is intended as a resource for technical researchers or research funders looking to contribute to AI governance.
Improving Gradient-Guided Nested Sampling for Posterior Inference
Pablo Lemos
Nikolay Malkin
Will Handley
We present a performant, general-purpose gradient-guided nested sampling (GGNS) algorithm, combining the state of the art in differentiable … (voir plus)programming, Hamiltonian slice sampling, clustering, mode separation, dynamic nested sampling, and parallelization. This unique combination allows GGNS to scale well with dimensionality and perform competitively on a variety of synthetic and real-world problems. We also show the potential of combining nested sampling with generative flow networks to obtain large amounts of high-quality samples from the posterior distribution. This combination leads to faster mode discovery and more accurate estimates of the partition function.
Memory Efficient Neural Processes via Constant Memory Attention Block
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
On Generalization for Generative Flow Networks
Anas Krichel
Nikolay Malkin
Salem Lahlou
Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an u… (voir plus)nnormalized probability distribution, called the reward function. This framework learns a policy on a constructed graph, which enables sampling from an approximation of the target probability distribution through successive steps of sampling from the learned policy. To achieve this, GFlowNets can be trained with various objectives, each of which can lead to the model s ultimate goal. The aspirational strength of GFlowNets lies in their potential to discern intricate patterns within the reward function and their capacity to generalize effectively to novel, unseen parts of the reward function. This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function. The experiments will focus on length generalization meaning generalization to states that can be constructed only by longer trajectories than those seen in training.
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Lazar Atanackovic
Xi Zhang
Brandon Amos
Leo J Lee
Alexander Tong
Kirill Neklyudov
Numerous biological and physical processes can be modeled as systems of interacting samples evolving continuously over time, e.g. the dynami… (voir plus)cs of communicating cells or physical particles. Flow-based models allow for learning these dynamics at the population level --- they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We propose
RGFN: Synthesizable Molecular Generation Using GFlowNets
Michał Koziarski
Andrei Rekesh
Dmytro Shevchuk
Almer M. van der Sloot
Piotr Gaiński
Cheng-Hao Liu
Mike Tyers
Robert A. Batey
Cell Morphology-Guided Small Molecule Generation with GFlowNets
Stephen Zhewen Lu
Ziqing Lu
Ehsan Hajiramezanali
Tommaso Biancalani
Gabriele Scalia
Michał Koziarski
AI-Assisted Generation of Difficult Math Questions
Vedant Shah
Dingli Yu
Kaifeng Lyu
Simon Park
Nan Rosemary Ke
Michael Curtis Mozer
James Lloyd McClelland
Sanjeev Arora
Anirudh Goyal
Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet dem… (voir plus)and for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core"skills"from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an"out of distribution"task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
Aniket Rajiv Didolkar
Anirudh Goyal
Nan Rosemary Ke
Siyuan Guo
Michal Valko
Timothy P Lillicrap
Danilo Jimenez Rezende
Michael Curtis Mozer
Sanjeev Arora
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
Lu Li
Tianyu Zhang
Zhiqi Bu
Suyuchen Wang
Huan He
Jie Fu
Yonghui Wu
Jiang Bian
Yong Chen
Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into … (voir plus)a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages.
VCR: Visual Caption Restoration
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (voir plus)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.