Yoshua Bengio

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Marie-Josée Beauchamp, adjointe administrative à marie-josee.beauchamp@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Collaborateur·rice alumni - McGill

Mohammed Abukalam

Collaborateur·rice alumni - UdeM

Berkes Anaïs

Collaborateur·rice de recherche - Cambridge University

Superviseur⋅e principal⋅e :

Rim Assouel

Doctorat - UdeM

Junyeob BAEK

Visiteur de recherche indépendant - KAIST

Visiteur de recherche indépendant

Co-superviseur⋅e :

Guillaume Lajoie

Paul Bertin

Doctorat - UdeM

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche - KAIST

Doctorat - UdeM

Doctorat - UdeM

Stagiaire de recherche - UdeM

Co-superviseur⋅e :

Loubna Benabbou

Eric Elmoznino

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Leo Feng

Doctorat - UdeM

leo.feng@mila.quebec

Ivan Grega

Stagiaire de recherche - UdeM

Doctorat

Doctorat - UdeM

mohsin.hasan@mila.quebec

Edward Hu

Doctorat - UdeM

Moksh Jain

Doctorat - UdeM

moksh.jain@mila.quebec

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - UdeM

Hyeonah Kim

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Alex Hernandez

Yaroslav KIVVA

Collaborateur·rice de recherche - UdeM

Salem Lahlou

Collaborateur·rice alumni - UdeM

Tabitha Edith Lee

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Seanie Lee

Collaborateur·rice alumni - UdeM

Collaborateur·rice alumni

Zhen Liu

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Liam Paull

Kanika Madan

Doctorat - UdeM

Nikolay Malkin

Collaborateur·rice alumni - UdeM

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

Sören Mindermann

Collaborateur·rice de recherche - UdeM

Sarthak Mittal

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant - UdeM

Padideh Nouri

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Ali Parviz

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Camille Rochefort-Boulanger

Lena Podina

Doctorat - University of Waterloo

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Stagiaire de recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Postdoctorat - UdeM

Visiteur de recherche indépendant - UdeM

Postdoctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Julie Hussin

Victor Schmidt

Collaborateur·rice alumni - UdeM

Postdoctorat - UdeM

Maîtrise recherche - UdeM

Marcin Sendera

Collaborateur·rice alumni - UdeM

Vedant Shah

Maîtrise recherche - UdeM

Postdoctorat

Marco Stock

Visiteur de recherche indépendant - Technical University of Munich

marco.stock@tum.de

Mélisande Astrid Crystal Teng

Doctorat - UdeM

Co-superviseur⋅e :

Hugo Larochelle

alexander.tong@mila.quebec

Alex Tong

Postdoctorat - UdeM

Postdoctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Omar G. Younis

Collaborateur·rice de recherche

Collaborateur·rice de recherche - KAIST

Doctorat - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Aaron Courville

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Harry Zhao

Doctorat - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

Scaling in the service of reasoning & model-based ML

4 avril 2023

par

Yoshua Bengio

Edward J. Hu

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

par

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

Les réseaux de flot génératifs

15 mars 2022

par

Yoshua Bengio

Publications

Gradient Starvation: A Learning Proclivity in Neural Networks

Mohammad Pezeshki

Sékou-Oumar Kaba

We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks… (voir plus). Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and real-world out-of-distribution (OOD) generalization experiments.

Neural Production Systems

Anirudh Goyal

Aniket Rajiv Didolkar

Nan Rosemary Ke

Charles Blundell

Philippe Beaudoin

Nicolas Heess

Michael Curtis Mozer

Visual environments are structured, consisting of distinct objects or entities. These entities have properties---visible or latent---that d… (voir plus)etermine the manner in which they interact with one another. To partition images into entities, deep-learning researchers have proposed structural inductive biases such as slot-based architectures. To model interactions among entities, equivariant graph neural nets (GNNs) are used, but these are not particularly well suited to the task for two reasons. First, GNNs do not predispose interactions to be sparse, as relationships among independent entities are likely to be. Second, GNNs do not factorize knowledge about interactions in an entity-conditional manner. As an alternative, we take inspiration from cognitive science and resurrect a classic approach, production systems, which consist of a set of rule templates that are applied by binding placeholder variables in the rules to specific entities. Rules are scored on their match to entities, and the best fitting rules are applied to update entity properties. In a series of experiments, we demonstrate that this architecture achieves a flexible, dynamic flow of control and serves to factorize entity-specific and rule-based information. This disentangling of knowledge achieves robust future-state prediction in rich visual environments, outperforming state-of-the-art methods using GNNs, and allows for the extrapolation from simple (few object) environments to more complex environments.

The Causal-Neural Connection: Expressiveness, Learnability, and Inference

Kevin Muyuan Xia

Kai-Zhan Lee

Elias Bareinboim

One of the central elements of any causal inference is an object called structural causal model (SCM), which represents a collection of mech… (voir plus)anisms and exogenous sources of random variation of the system under investigation (Pearl, 2000). An important property of many kinds of neural networks is universal approximability: the ability to approximate any function to arbitrary precision. Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM. In this paper, we show this is not the case by disentangling the notions of expressivity and learnability. Specifically, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020), which describes the limits of what can be learned from data, still holds for neural models. For instance, an arbitrarily complex and expressive neural net is unable to predict the effects of interventions given observational data alone. Given this result, we introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences. Building on this new class of models, we focus on solving two canonical tasks found in the literature known as causal identification and estimation. Leveraging the neural toolbox, we develop an algorithm that is both sufficient and necessary to determine whether a causal effect can be learned from data (i.e., causal identifiability); it then estimates the effect whenever identifiability holds (causal estimation). Simulations corroborate the proposed approach.

Problèmes associés au déploiement des modèles fondés sur l’apprentissage machine en santé

Joseph Paul Cohen

Tianshi Cao

Joseph D Viviano

Chin-Wei Huang

Michael Fralick

Marzyeh Ghassemi

Muhammad Mamdani

Russell Greiner

2021-11-07

Canadian Medical Association Journal (publié)

From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence

Nicholas Roy

Ingmar Posner

T. Barfoot

Philippe Beaudoin

Jeannette Bohg

Oliver Brock

Isabelle Depatie

Dieter Fox

D. Koditschek

Tom'as Lozano-p'erez

Vikash K. Mansinghka

Chris Pal

Blake Richards

Dorsa Sadigh

Stefan Schaal

G. Sukhatme

Denis Therien

Marc Emile Toussaint

Michiel van de Panne

2021-10-28

ArXiv (prépublication)

arxiv.org

CAMAP: Artificial neural networks unveil the role of codon arrangement in modulating MHC-I peptides presentation

Tariq Daouda

Maude Dumont-Lagacé

Albert Feghaly

Yahya Benslimane

Rébecca Panes

Mathieu Courcelles

Mohamed Benhammadi

Lea Harrington

Pierre Thibault

François Major

Étienne Gagnon

Sébastien Lemieux

Claude Perreault

MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and neoplastic cells by CD8 T cells. However, accu… (voir plus)rately predicting the MAP repertoire remains difficult, because only a fraction of the transcriptome generates MAPs. In this study, we investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons (MCCs), while excluding the MCC per se. CAMAP predictions were significantly more accurate when using original codon sequences than shuffled codon sequences which reflect amino acid usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, transcript expression level and CAMAP scores was particularly useful to increaser MAP prediction accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon arrangement in the regulation of MAP presentation and support integration of both translational and post-translational events in predictive algorithms to ameliorate modeling of the immunopeptidome. Author summary MHC-I associated peptides (MAPs) are small fragments of intracellular proteins presented at the surface of cells and used by the immune system to detect and eliminate cancerous or virus-infected cells. While it is theoretically possible to predict which portions of the intracellular proteins will be naturally processed by the cells to ultimately reach the surface, current methodologies have prohibitively high false discovery rates. Here we introduce an artificial neural network called Codon Arrangement MAP Predictor (CAMAP) which integrates information from mRNA-to-protein translation to other factors regulating MAP biogenesis (e.g. MAP ligand score and transcript expression levels) to improve MAP prediction accuracy. While most MAP predictive approaches focus on MAP sequences per se, CAMAP’s novelty is to analyze the MAP-flanking mRNA sequences, thereby providing completely independent information for MAP prediction. We show on several datasets that the integration of CAMAP scores with other known factors involved in MAP presentation (i.e. MAP ligand score and mRNA expression) significantly improves MAP prediction accuracy, and further validate CAMAP learned features using an in-vitro assay. These findings may have major implications for the design of vaccines against cancers and viruses, and in times of pandemics could accelerate the identification of relevant MAPs of viral origins.

2021-10-22

PLoS Computational Biology (publié)

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

Nan Rosemary Ke

Aniket Rajiv Didolkar

Sarthak Mittal

Anirudh Goyal

Guillaume Lajoie

Stefan Bauer

Danilo Jimenez Rezende

Michael Curtis Mozer

Chris Pal

Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise tha… (voir plus)t the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.

2021-10-11

NeurIPS.cc/2021/Track/Datasets_and_Benchmarks/Round2 (publié)

FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters

Yuwei Cheng

Jiannan Zhu

Mengxin Jiang

Jie Fu

Changsong Pang

Peidong Wang

Kris Sankaran

Olawale Moses Onabola

Yimin Liu

Dianbo Liu

Marine debris is severely threatening the marine lives and causing sustained pollution to the whole ecosystem. To prevent the wastes from ge… (voir plus)tting into the ocean, it is helpful to clean up the floating wastes in inland waters using the autonomous cleaning devices like unmanned surface vehicles. The cleaning efficiency relies on a high-accurate and robust object detection system. However, the small size of the target, the strong light reflection over water surface, and the reflection of other objects on bank-side all bring challenges to the vision-based object detection system. To promote the practical application for autonomous floating wastes cleaning, we present FloW†, the first dataset for floating waste detection in inland water areas. The dataset consists of an image sub-dataset FloW-Img and a multimodal sub-dataset FloW-RI which contains synchronized millimeter wave radar data and images. Accurate annotations for images and radar data are provided, supporting floating waste detection strategies based on image, radar data, and the fusion of two sensors. We perform several baseline experiments on our dataset, including vision-based and radar-based detection methods. The results show that, the detection accuracy is relatively low and floating waste detection still remains a challenging task.

2021-10-01

IEEE International Conference on Computer Vision (publié)

Learning Neural Causal Models with Active Interventions

Nino Scherrer

Olexa Bilaniuk

Yashas Annadani

Anirudh Goyal

Patrick Schwab

Bernhard Schölkopf

Michael Curtis Mozer

Stefan Bauer

Nan Rosemary Ke

Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing … (voir plus)scaling properties of neural networks have recently led to a surge of interest in differentiable neural network-based methods for learning causal structures from data. So far, differentiable causal discovery has focused on static datasets of observational or interventional origin. In this work, we introduce an active intervention-targeting mechanism which enables quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting and is applicable for both discrete and continuous optimization formulations of learning the underlying directed acyclic graph (DAG) from data. We examine the proposed method across multiple frameworks in a wide range of settings and demonstrate superior performance on multiple benchmarks from simulated to real-world data.

2021-09-06

ArXiv (prépublication)

Problems in the deployment of machine-learned models in health care

Joseph Paul Cohen

Tianshi Cao

Joseph D Viviano

Chinwei Huang

M. Fralick

Marzyeh Ghassemi

M. Mamdani

R. Greiner

2021-08-30

Canadian Medical Association Journal (publié)

Exploration-Driven Representation Learning in Reinforcement Learning

Akram Erraqabi

Harry Zhao

Mingde Zhao

Marlos C. Machado

Sainbayar Sukhbaatar

Ludovic Denoyer

Alessandro Lazaric

Learning reward-agnostic representations is an emerging paradigm in reinforcement learning. These representations can be leveraged for sever… (voir plus)al purposes ranging from reward shaping to skill discovery. Nevertheless, in order to learn such representations, existing methods often rely on assuming uniform access to the state space. With such a privilege, the agent’s coverage of the environment can be limited which hurts the quality of the learned representations. In this work, we introduce a method that explicitly couples representation learning with exploration when the agent is not provided with a uniform prior over the state space. Our method learns representations that constantly drive exploration while the data generated by the agent’s exploratory behavior drives the learning of better representations. We empirically validate our approach in goal-achieving tasks, demonstrating that the learned representation captures the dynamics of the environment, leads to more accurate value estimation, and to faster credit assignment, both when used for control and for reward shaping. Finally, the exploratory policy that emerges from our approach proves to be successful at continuous navigation tasks with sparse rewards.

2021-07-22

ICML.cc/2021/Workshop/URL (poster)

Combating False Negatives in Adversarial Imitation Learning

Konrad Żołna

Chitwan Saharia

Léonard Boussioux

David Y. T. Hui

Maxime Chevalier-Boisvert

Dzmitry Bahdanau

In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the de… (voir plus)sired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the ‘False Negatives’ (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.

2021-07-18

2021 International Joint Conference on Neural Networks (IJCNN) (publié)