Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle
Fondateur et Conseiller scientifique, Équipe de direction
Sujets de recherche
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Causalité
Modèles génératifs
Modèles probabilistes
Modélisation moléculaire
Neurosciences computationnelles
Raisonnement
Réseaux de neurones en graphes
Réseaux de neurones récurrents
Théorie de l'apprentissage automatique
Traitement du langage naturel

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Marie-Josée Beauchamp, adjointe administrative à marie-josee.beauchamp@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - UdeM
Collaborateur·rice de recherche - Cambridge University
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Visiteur de recherche indépendant
Co-superviseur⋅e :
Doctorat - UdeM
Visiteur de recherche indépendant
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - N/A
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Collaborateur·rice de recherche - KAIST
Collaborateur·rice alumni - UdeM
Collaborateur·rice alumni - UdeM
Co-superviseur⋅e :
Visiteur de recherche indépendant
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni
Doctorat - UdeM
Collaborateur·rice alumni - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Ying Wu Coll of Computing
Collaborateur·rice de recherche - University of Waterloo
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems
Collaborateur·rice de recherche - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Postdoctorat - UdeM
Visiteur de recherche indépendant - UdeM
Postdoctorat - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Collaborateur·rice alumni - UdeM
Postdoctorat
Co-superviseur⋅e :
Visiteur de recherche indépendant - Technical University of Munich
Doctorat - UdeM
Co-superviseur⋅e :
Visiteur de recherche indépendant
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Postdoctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - McGill
Superviseur⋅e principal⋅e :

Publications

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning
Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environment… (voir plus)s. To facilitate research addressing this problem, we proposeCausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.
Predicting Infectiousness for Proactive Contact Tracing
Prateek Gupta
Nasim Rahaman
Hannah Alsdurf
gaetan caron
satya ortiz gagne
Bernhard Schölkopf … (voir 3 de plus)
Abhinav Sharma
Andrew Robert Williams
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdo… (voir plus)wns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.
RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs
Louis-Pascal Xhonneux
This paper studies learning logic rules for reasoning on knowledge graphs. Logic rules provide interpretable explanations when used for pred… (voir plus)iction as well as being able to generalize to other tasks, and hence are critical to learn. Existing methods either suffer from the problem of searching in a large search space (e.g., neural logic programming) or ineffective optimization due to sparse rewards (e.g., techniques based on reinforcement learning). To address these limitations, this paper proposes a probabilistic model called RNNLogic. RNNLogic treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. We develop an EM-based algorithm for optimization. In each iteration, the reasoning predictor is updated to explore some generated logic rules for reasoning. Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step. Experiments on four datasets prove the effectiveness of RNNLogic.
Spatially Structured Recurrent Modules
Nasim Rahaman
Muhammad Waleed Gondal
Manuel Wüthrich
Yash Sharma
Bernhard Schölkopf
Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalise we… (voir plus)ll and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. To this end, we model the dynamical system as a collection of autonomous but sparsely interacting sub-systems that interact according to a learned topology which is informed by the spatial structure of the underlying system. This gives rise to a class of models that are well suited for capturing the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modelling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalisation to novel tasks without additional training than strong baselines that perform equally well or better on the training distribution.
Attention Based Pruning for Shift Networks
In many application domains such as computer vision, Convolutional Layers (CLs) are key to the accuracy of deep learning methods. However, i… (voir plus)t is often required to assemble a large number of CLs, each containing thousands of parameters, in order to reach state-of-the-art accuracy, thus resulting in complex and demanding systems that are poorly fitted to resource-limited devices. Recently, methods have been proposed to replace the generic convolution operator by the combination of a shift operation and a simpler
An Analysis of the Adaptation Speed of Causal Models
C AUSAL R: Causal Reasoning over Natural Language Rulebases
Jason Weston
Sumit Chopra
Thomas Wolf
Lysandre Debut
Julien Victor Sanh
Clement Chaumond
Anthony Delangue
Pier-339 Moi
Tim ric Cistac
R´emi Rault
Morgan Louf
Funtow-900 Joe
Sam Davison
Patrick Shleifer
Von Platen
Clara Ma
Yacine Jernite
Julien Plu
Canwen Xu … (voir 6 de plus)
Zhilin Yang
Peng Qi
William W Cohen
Russ Salakhutdinov
Transformers have been shown to be able to 001 perform deductive reasoning on a logical rule-002 base containing rules and statements writte… (voir plus)n 003 in natural language. Recent works show that 004 such models can also produce the reasoning 005 steps (i.e., the proof graph ) that emulate the 006 model’s logical reasoning process. But these 007 models behave as a black-box unit that emu-008 lates the reasoning process without any causal 009 constraints in the reasoning steps, thus ques-010 tioning the faithfulness. In this work, we frame 011 the deductive logical reasoning task as a causal 012 process by defining three modular components: 013 rule selection, fact selection, and knowledge 014 composition. The rule and fact selection steps 015 select the candidate rule and facts to be used 016 and then the knowledge composition combines 017 them to generate new inferences. This ensures 018 model faithfulness by assured causal relation 019 from the proof step to the inference reasoning. 020 To test our causal reasoning framework, we 021 propose C AUSAL R where the above three com-022 ponents are independently modeled by trans-023 formers. We observe that C AUSAL R is robust 024 to novel language perturbations, and is com-025 petitive with previous works on existing rea-026 soning datasets. Furthermore, the errors made 027 by C AUSAL R are more interpretable due to 028 the multi-modular approach compared to black-029 box generative models. 1 030
BabyAI 1.1
David Y. T. Hui
Maxime Chevalier-Boisvert
The BabyAI platform is designed to measure the sample efficiency of training an agent to follow grounded-language instructions. BabyAI 1.0 … (voir plus)presents baseline results of an agent trained by deep imitation or reinforcement learning. BabyAI 1.1 improves the agent’s architecture in three minor ways. This increases reinforcement learning sample efficiency by up to 3 × and improves imitation learning performance on the hardest level from 77% to 90 . 4% . We hope that these improvements increase the computational efficiency of BabyAI experiments and help users design better agents.
BabyAI 1.1
David Y. T. Hui
Maxime Chevalier-Boisvert
CAMAP: Artificial neural networks unveil the role of 1 codon arrangement in modulating MHC-I peptides 2 presentation
Tariq Daouda
Maude Dumont-Lagacé
Albert Feghaly
Yahya Benslimane
6. Rébecca
Panes
Mathieu Courcelles
Mohamed Benhammadi
Lea Harrington
Pierre Thibault
François Major
Étienne Gagnon
Claude Perreault
30 MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and 31 neoplastic cells by CD8 T cells. However… (voir plus), accurately predicting the MAP repertoire remains 32 difficult, because only a fraction of the transcriptome generates MAPs. In this study, we 33 investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We 34 developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), 35 predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons 36 (MCCs), while excluding the MCC per se . CAMAP predictions were significantly more accurate 37 when using original codon sequences than shuffled codon sequences which reflect amino acid 38 usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity 39 to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, 40 transcript expression level and CAMAP scores was particularly useful to increaser MAP prediction 41 accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions 42 flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation 43 of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon 44 arrangement in the regulation of MAP presentation and support integration of both translational 45 and post-translational events in predictive algorithms to ameliorate modeling of the 46 immunopeptidome. 47 48 49 they modulated the levels of SIINFEKL presentation in both constructs, but enhanced translation efficiency could only be detected for OVA-RP. These data show that codon arrangement can modulate MAP presentation strength without any changes in the amino
CAMAP: Artificial neural networks unveil the role of 1 codon arrangement in modulating MHC-I peptides 2 presentation discovery of minor histocompatibility with
Tariq Daouda
Maude Dumont-Lagacé
Albert Feghaly
Yahya Benslimane
6. Rébecca
Panes
Mathieu Courcelles
Mohamed Benhammadi
Lea Harrington
Pierre Thibault
François Major
Étienne Gagnon
Claude Perreault
30 MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and 31 neoplastic cells by CD8 T cells. However… (voir plus), accurately predicting the MAP repertoire remains 32 difficult, because only a fraction of the transcriptome generates MAPs. In this study, we 33 investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We 34 developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), 35 predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons 36 (MCCs), while excluding the MCC per se . CAMAP predictions were significantly more accurate 37 when using original codon sequences than shuffled codon sequences which reflect amino acid 38 usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity 39 to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, 40 transcript expression level and CAMAP scores was particularly useful to increaser MAP prediction 41 accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions 42 flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation 43 of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon 44 arrangement in the regulation of MAP presentation and support integration of both translational 45 and post-translational events in predictive algorithms to ameliorate modeling of the 46 immunopeptidome. 47 48 49 they modulated the levels of SIINFEKL presentation in both constructs, but enhanced translation efficiency could only be detected for OVA-RP. These data show that codon arrangement can modulate MAP presentation strength without any changes in the amino
A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during plan… (voir plus)ning. The agent uses a bottleneck mechanism over a set-based representation to force the number of entities to which the agent attends at each planning step to be small. In experiments, we investigate the bottleneck mechanism with several sets of customized environments featuring different challenges. We consistently observe that the design allows the planning agents to generalize their learned task-solving abilities in compatible unseen environments by attending to the relevant objects, leading to better out-of-distribution generalization performance.