Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle
Directeur scientifique, Équipe de direction
Observateur, Conseil d'administration, Mila

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Julie Mongeau, adjointe de direction à julie.mongeau@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et directeur scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de directeur scientifique d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Maîtrise professionnelle - Université de Montréal
Co-superviseur⋅e :
Maîtrise professionnelle - Université de Montréal
Doctorat - Université de Montréal
Postdoctorat - Université de Montréal
Co-superviseur⋅e :
Postdoctorat - Université de Montréal
Doctorat - Université de Montréal
Collaborateur·rice de recherche - Université Paris-Saclay
Superviseur⋅e principal⋅e :
Maîtrise professionnelle - Université de Montréal
Visiteur de recherche indépendant - MIT
Doctorat - École Polytechnique Montréal Fédérale de Lausanne
Stagiaire de recherche - Université du Québec à Rimouski
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Postdoctorat - Université de Montréal
Co-superviseur⋅e :
Maîtrise professionnelle - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Barcelona University
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Postdoctorat - Université de Montréal
Co-superviseur⋅e :
Maîtrise recherche - Université de Montréal
Doctorat - Université de Montréal
Stagiaire de recherche - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Stagiaire de recherche - UQAR
Collaborateur·rice alumni
Visiteur de recherche indépendant - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Stagiaire de recherche - McGill University
Visiteur de recherche indépendant - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Co-superviseur⋅e :
Maîtrise professionnelle - Université de Montréal
Stagiaire de recherche - Université de Montréal
Doctorat - Université de Montréal
Doctorat - Massachusetts Institute of Technology
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Visiteur de recherche indépendant - Technical University Munich (TUM)
Visiteur de recherche indépendant - Hong Kong University of Science and Technology (HKUST)
DESS - Université de Montréal
Visiteur de recherche indépendant - UQAR
Postdoctorat - Université de Montréal
Doctorat - Université de Montréal
Stagiaire de recherche - Université de Montréal
Visiteur de recherche indépendant - Technical University of Munich
Stagiaire de recherche - Imperial College London
Doctorat - Université de Montréal
Co-superviseur⋅e :
Postdoctorat - Université de Montréal
Doctorat - McGill University
Superviseur⋅e principal⋅e :
Maîtrise professionnelle - Université de Montréal
Collaborateur·rice de recherche - Université de Montréal
Stagiaire de recherche - Université de Montréal
Stagiaire de recherche - Université de Montréal
Doctorat - Université de Montréal
Doctorat - Max-Planck-Institute for Intelligent Systems
Doctorat - McGill University
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - Université de Montréal
Maîtrise professionnelle - Université de Montréal
Doctorat - Université de Montréal
Visiteur de recherche indépendant - Université de Montréal
Collaborateur·rice alumni - Université de Montréal
Collaborateur·rice de recherche
Maîtrise professionnelle - Université de Montréal
Collaborateur·rice de recherche - Valence
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Stagiaire de recherche - Université de Montréal
Collaborateur·rice de recherche - Université de Montréal
Visiteur de recherche indépendant
Co-superviseur⋅e :
Postdoctorat - Université de Montréal
Stagiaire de recherche - McGill University
Maîtrise professionnelle - Université de Montréal
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Maîtrise recherche - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Maîtrise recherche - Université de Montréal
Doctorat - Université de Montréal
Collaborateur·rice de recherche - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Superviseur⋅e principal⋅e :
Baccalauréat - Université de Montréal
Doctorat - Université de Montréal
Maîtrise professionnelle - Université de Montréal
Maîtrise professionnelle - Université de Montréal
Stagiaire de recherche - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Maîtrise professionnelle - Université de Montréal
Postdoctorat - Université de Montréal

Publications

hBERT + BiasCorp - Fighting Racism on the Web
Olawale Moses Onabola
Zhuang Ma
Xie Yang
Benjamin Akera
Ibraheem Abdulrahman
Jia Xue
Dianbo Liu
Subtle and overt racism is still present both in physical and online communities today and has impacted many lives in different segments of … (voir plus)the society. In this short piece of work, we present how we’re tackling this societal issue with Natural Language Processing. We are releasing BiasCorp, a dataset containing 139,090 comments and news segment from three specific sources - Fox News, BreitbartNews and YouTube. The first batch (45,000 manually annotated) is ready for publication. We are currently in the final phase of manually labeling the remaining dataset using Amazon Mechanical Turk. BERT has been used widely in several downstream tasks. In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer. hBert generalizes well across different distributions with the added advantage of a reduced model complexity. We are also releasing a JavaScript library 3 and a Chrome Extension Application, to help developers make use of our trained model in web applications (say chat application) and for users to identify and report racially biased contents on the web respectively
Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers
Alex Lamb
Anirudh Goyal
A. Slowik
Michael Curtis Mozer
Philippe Beaudoin
Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (voir plus)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.
Predicting Infectiousness for Proactive Contact Tracing
Prateek Gupta
Nasim Rahaman
Martin Weiss
Tristan Deleu
Meng Qu
Victor Schmidt
Pierre-Luc St-Charles
Hannah Alsdurf
Olexa Bilaniuk
gaetan caron
pierre luc carrier
Joumana Ghosn
satya ortiz gagne
Bernhard Schölkopf … (voir 3 de plus)
abhinav sharma
andrew williams
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdo… (voir plus)wns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.
An Analysis of the Adaptation Speed of Causal Models
Rémi LE PRIOL
Reza Babanezhad Harikandeh
Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization
Kartik Ahuja
Ethan Caballero
Dinghuai Zhang
Jean-Christophe Gagnon-Audet
The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address… (voir plus) out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.
Inductive biases for deep learning of higher-level cognition
Anirudh Goyal
A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopaedic list of … (voir plus)heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behaviour of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans’ abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.
Revisiting Fundamentals of Experience Replay
William Fedus
Prajit Ramachandran
Rishabh Agarwal
Mark Rowland
Will Dabney
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understa… (voir plus)nding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.
Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers
Alex Lamb
Anirudh Goyal
A. Slowik
Michael Curtis Mozer
Philippe Beaudoin
Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (voir plus)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.
COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing
Prateek Gupta
Martin Weiss
Nasim Rahaman
Hannah Alsdurf
abhinav sharma
Nanor Minoyan
Soren Harnois-Leblanc
Victor Schmidt
Pierre-Luc St-Charles
Tristan Deleu
andrew williams
Akshay Patel
Meng Qu
Olexa Bilaniuk
gaetan caron
pierre luc carrier
satya ortiz gagne
Marc-Andre Rousseau
Joumana Ghosn
Yang Zhang
Bernhard Schölkopf
Joanna Merckx
A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM
Iulian V. Serban
Varun Gupta
Ekaterina Kochmar
Dung D. Vu
Robert Belfer
An Analysis of the Adaptation Speed of Causal Models
Rémi LE PRIOL
Reza Babanezhad Harikandeh
We consider the problem of discovering the causal process that generated a collection of datasets. We assume that all these datasets were ge… (voir plus)nerated by unknown sparse interventions on a structural causal model (SCM)
COVI White Paper
Hannah Alsdurf
Tristan Deleu
Prateek Gupta
Daphne Ippolito
Richard Janda
Max Jarvie
Tyler J. Kolody
Sekoul Krastev
Robert Obryk
Dan Pilat
Valerie Pisano
Benjamin Prud'homme
Meng Qu
Nasim Rahaman
Jean-franois Rousseau
abhinav sharma
Brooke Struck … (voir 3 de plus)
Martin Weiss
Yun William Yu