Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle
Directeur scientifique, Équipe de direction
Observateur, Conseil d'administration, Mila
Sujets de recherche
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Causalité
Modèles génératifs
Modèles probabilistes
Modélisation moléculaire
Neurosciences computationnelles
Raisonnement
Réseaux de neurones en graphes
Réseaux de neurones récurrents
Théorie de l'apprentissage automatique
Traitement du langage naturel

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Julie Mongeau, adjointe de direction à julie.mongeau@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et directeur scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de directeur scientifique d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Stagiaire de recherche - McGill
Stagiaire de recherche - UdeM
Doctorat - UdeM
Collaborateur·rice alumni
Stagiaire de recherche - Université du Québec à Rimouski
Visiteur de recherche indépendant
Co-superviseur⋅e :
Doctorat - UdeM
Stagiaire de recherche - UQAR
Visiteur de recherche indépendant - MIT
Doctorat - UdeM
Postdoctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice alumni - UdeM
Collaborateur·rice de recherche - Université Paris-Saclay
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Doctorat - Massachusetts Institute of Technology
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Stagiaire de recherche - Barcelona University
Stagiaire de recherche - UdeM
Collaborateur·rice de recherche - UdeM
Stagiaire de recherche
Postdoctorat - UdeM
Co-superviseur⋅e :
Visiteur de recherche indépendant - Technical University Munich (TUM)
Doctorat - UdeM
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Stagiaire de recherche - UdeM
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Postdoctorat - UdeM
Doctorat - UdeM
Collaborateur·rice alumni
Collaborateur·rice alumni - UdeM
Collaborateur·rice alumni
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Stagiaire de recherche - Imperial College London
Doctorat - UdeM
Stagiaire de recherche - UdeM
Collaborateur·rice alumni - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Collaborateur·rice alumni
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - UdeM
Visiteur de recherche indépendant - Hong Kong University of Science and Technology (HKUST)
Collaborateur·rice de recherche - Ying Wu Coll of Computing
Doctorat - University of Waterloo
Superviseur⋅e principal⋅e :
Doctorat - Max-Planck-Institute for Intelligent Systems
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Visiteur de recherche indépendant - UdeM
Visiteur de recherche indépendant - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Stagiaire de recherche - UdeM
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Stagiaire de recherche - UdeM
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Collaborateur·rice alumni
Visiteur de recherche indépendant - Technical University of Munich
Doctorat - École Polytechnique Fédérale de Lausanne
Postdoctorat - Polytechnique
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Valence
Superviseur⋅e principal⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Collaborateur·rice alumni - UdeM
Collaborateur·rice de recherche - KAIST
Stagiaire de recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :

Publications

Adaptive teachers for amortized samplers
Minsu Kim
Sanghyeok Choi
Taeyoung Yun
Emmanuel Bengio
Leo Feng
Jarrid Rector-Brooks
Sungsoo Ahn
Jinkyoo Park
Nikolay Malkin
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnorma… (voir plus)lized density where exact sampling is intractable. When sampling is implemented as a sequential decision-making process, reinforcement learning (RL) methods, such as generative flow networks, can be used to train the sampling policy. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose to use an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions. The Teacher, an auxiliary behavior model, is trained to sample high-error regions of the Student and can generalize across unexplored modes, thereby enhancing mode coverage by providing an efficient training curriculum. We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge, two diffusion-based sampling tasks, and four biochemical discovery tasks demonstrating its ability to improve sample efficiency and mode coverage.
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Emily Cheng
Compositionality, the notion that the meaning of an expression is constructed from the meaning of its parts and syntactic rules, permits the… (voir plus) infinite productivity of human language. For the first time, artificial language models (LMs) are able to match human performance in a number of compositional generalization tasks. However, much remains to be understood about the representational mechanisms underlying these abilities. We take a high-level geometric approach to this problem by relating the degree of compositionality in a dataset to the intrinsic dimensionality of its representations under an LM, a measure of feature complexity. We find not only that the degree of dataset compositionality is reflected in representations' intrinsic dimensionality, but that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training. Finally, our analyses reveal a striking contrast between linear and nonlinear dimensionality, showing that they respectively encode formal and semantic aspects of linguistic composition.
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Juho Lee
Sung Ju Hwang
Safety guard models that detect malicious queries aimed at large language models (LLMs) are essential for ensuring the secure and responsibl… (voir plus)e deployment of LLMs in real-world applications. However, deploying existing safety guard models with billions of parameters alongside LLMs on mobile devices is impractical due to substantial memory requirements and latency. To reduce this cost, we distill a large teacher safety guard model into a smaller one using a labeled dataset of instruction-response pairs with binary harmfulness labels. Due to the limited diversity of harmful instructions in the existing labeled dataset, naively distilled models tend to underperform compared to larger models. To bridge the gap between small and large models, we propose HarmAug, a simple yet effective data augmentation method that involves jailbreaking an LLM and prompting it to generate harmful instructions. Given a prompt such as,"Make a single harmful instruction prompt that would elicit offensive content", we add an affirmative prefix (e.g.,"I have an idea for a prompt:") to the LLM's response. This encourages the LLM to continue generating the rest of the response, leading to sampling harmful instructions. Another LLM generates a response to the harmful instruction, and the teacher model labels the instruction-response pair. We empirically show that our HarmAug outperforms other relevant baselines. Moreover, a 435-million-parameter safety guard model trained with HarmAug achieves an F1 score comparable to larger models with over 7 billion parameters, and even outperforms them in AUPRC, while operating at less than 25% of their computational cost.
Were RNNs All We Needed?
Leo Feng
Frederick Tung
Mohamed Osama Ahmed
Hossein Hajimirsadegh
A Data-driven Discovery of the Causal Connection between Galaxy and Black Hole Evolution
Zehao Jin
Mario Pasquato
Benjamin L. Davis
Tristan Deleu
Yu Luo
Changhyun Cho
Pablo Lemos
Xi Kang
A. Macciò
A neuronal least-action principle for real-time learning in cortical circuits
Walter Senn
Dominik Dold
Akos F. Kungl
Benjamin Ellenberger
Jakob Jordan
João Sacramento
Mihai A. Petrovici
One of the most fundamental laws of physics is the principle of least action. Motivated by its predictive power, we introduce a neuronal lea… (voir plus)st-action principle for cortical processing of sensory streams to produce appropriate behavioural outputs in real time. The principle postulates that the voltage dynamics of cortical pyramidal neurons prospectively minimize the local somato-dendritic mismatch error within individual neurons. For motor output neurons, it implies minimizing an instantaneous behavioural error. For deep network neurons, it implies a prospective firing to overcome integration delays and correct for possible output errors right in time. The neuron-specific errors are extracted in the apical dendrites of pyramidal neurons through a cortical microcircuit that tries to explain away the feedback from the periphery, and correct the trajectory on the fly. Any motor output is in a moving equilibrium with the sensory inputs and the motor feedback during the whole sensory-motor trajectory. Ongoing synaptic plasticity reduces the somato-dendritic mismatch error within each cortical neuron and performs gradient descent on the output cost at any moment in time. The neuronal least-action principle offers an axiomatic framework to derive local neuronal and synaptic dynamics for global real-time computation and learning in the brain and in physical substrates in general.
AI content detection in the emerging information ecosystem: new obligations for media and tech companies
Alistair Knott
Dino Pedreschi
Toshiya Jitsuzumi
Susan Leavy
D. Eyers
Tapabrata Chakraborti
Andrew Trotman
Sundar Sundareswaran
Ricardo Baeza-Yates
Przemyslaw Biecek
Adrian Weller
Paul D. Teal
Subhadip Basu
Mehmet Haklidir
Virginia Morini
Stuart Russell
A high-throughput phenotypic screen combined with an ultra-large-scale deep learning-based virtual screening reveals novel scaffolds of antibacterial compounds
Gabriele Scalia
Steven T. Rutherford
Ziqing Lu
Kerry R. Buchholz
Nicholas Skelton
Kangway Chuang
Nathaniel Diamant
Jan-Christian Hütter
Jerome-Maxim Luescher
Anh Miu
Jeff Blaney
Leo Gendelev
Elizabeth Skippington
Greg Zynda
Nia Dickson
Michał Koziarski
Aviv Regev
Man-Wah Tan
Tommaso Biancalani
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar
Abulhair Saparov
Javier Rando
Daniel Paleka
Miles Turpin
Peter Hase
Ekdeep Singh Lubana
Erik Jenner
Stephen Casper
Oliver Sourbut
Benjamin L. Edelman
Zhaowei Zhang
Mario Günther
Anton Korinek
Jose Hernandez-Orallo
Lewis Hammond
Eric J Bigelow
Alexander Pan
Lauro Langosco
Tomasz Korbak … (voir 22 de plus)
Heidi Chenyu Zhang
Ruiqi Zhong
Sean O hEigeartaigh
Gabriel Recchia
Giulio Corsi
Alan Chan
Markus Anderljung
Lilian Edwards
Aleksandar Petrov
Christian Schroeder de Witt
Danqi Chen
Sumeet Ramesh Motwani
Jakob Nicolaus Foerster
Philip Torr
Florian Tramèr
Samuel Albanie
Yejin Choi
He He
Atoosa Kasirzadeh
Zero-Shot Object-Centric Representation Learning
Aniket Rajiv Didolkar
Andrii Zadaianchuk
Anirudh Goyal
Michael Curtis Mozer
Georg Martius
Maximilian Seitzer
The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (voir plus). Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.
Can a Bayesian Oracle Prevent Harm from an Agent?
Michael K. Cohen
Nikolay Malkin
Matt MacDermott
Damiano Fornasiere
Pietro Greiner
Younesse Kaddar
Is there a way to design powerful AI systems based on machine learning methods that would satisfy probabilistic safety guarantees? With the … (voir plus)long-term goal of obtaining a probabilistic guarantee that would apply in every context, we consider estimating a context-dependent bound on the probability of violating a given safety specification. Such a risk evaluation would need to be performed at run-time to provide a guardrail against dangerous actions of an AI. Noting that different plausible hypotheses about the world could produce very different outcomes, and because we do not know which one is right, we derive bounds on the safety violation probability predicted under the true but unknown hypothesis. Such bounds could be used to reject potentially dangerous actions. Our main results involve searching for cautious but plausible hypotheses, obtained by a maximization that involves Bayesian posteriors over hypotheses. We consider two forms of this result, in the iid case and in the non-iid case, and conclude with open problems towards turning such theoretical results into practical AI guardrails.
Open Problems in Technical AI Governance
Anka Reuel
Benjamin Bucknall
Stephen Casper
Tim Fist
Lisa Soder
Onni Aarne
Lewis Hammond
Lujain Ibrahim
Alan Chan
Peter Wills
Markus Anderljung
Ben Garfinkel
Lennart Heim
Andrew Trask
Gabriel Mukobi
Rylan Schaeffer
Mauricio Baker
Sara Hooker
Irene Solaiman
Alexandra Luccioni … (voir 11 de plus)
Nitarshan Rajkumar
Nicolas Moes
Jeffrey Ladish
Neel Guha
Jessica Newman
Tobin South
Alex Pentland
Sanmi Koyejo
Mykel Kochenderfer
Robert F. Trager
AI progress is creating a growing range of risks and opportunities, but it is often unclear how they should be navigated. In many cases, the… (voir plus) barriers and uncertainties faced are at least partly technical. Technical AI governance, referring to technical analysis and tools for supporting the effective governance of AI, seeks to address such challenges. It can help to (a) identify areas where intervention is needed, (b) identify and assess the efficacy of potential governance actions, and (c) enhance governance options by designing mechanisms for enforcement, incentivization, or compliance. In this paper, we explain what technical AI governance is, why it is important, and present a taxonomy and incomplete catalog of its open problems. This paper is intended as a resource for technical researchers or research funders looking to contribute to AI governance.