Yoshua Bengio

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Julie Mongeau, adjointe de direction à julie.mongeau@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et directeur scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de directeur scientifique d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Stagiaire de recherche - McGill

Mohammed Abukalam

Stagiaire de recherche - UdeM

Rim Assouel

Doctorat - UdeM

Dan Assouline

Collaborateur·rice alumni

Ayoub Atanane

Stagiaire de recherche - Université du Québec à Rimouski

Stefan Bauer

Visiteur de recherche indépendant

Co-superviseur⋅e :

Guillaume Lajoie

Paul Bertin

Doctorat - UdeM

Ghait Boukachab

Stagiaire de recherche - UQAR

Doctorat - UdeM

Visiteur de recherche indépendant - MIT

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Chen Chen

Postdoctorat - UdeM

Co-superviseur⋅e :

Blake Richards

Xiaoyin Chen

Doctorat - UdeM

Pierre-Paul De Breuck

Collaborateur·rice alumni - UdeM

Doctorat - UdeM

Doctorat - UdeM

Collaborateur·rice de recherche - Université Paris-Saclay

Superviseur⋅e principal⋅e :

Eric Elmoznino

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - Massachusetts Institute of Technology

Léna Nehale Ezzine

Doctorat - UdeM

Jean-Pierre Falet

Doctorat - UdeM

Co-superviseur⋅e :

Leo Feng

Doctorat - UdeM

Stagiaire de recherche - Barcelona University

Piotr Gainski

Stagiaire de recherche - UdeM

Ivan Grega

Collaborateur·rice de recherche - UdeM

Pietro Greiner

Stagiaire de recherche

Mohsin Hasan

Doctorat - UdeM

mohsin.hasan@mila.quebec

Alex Hernandez-Garcia

Postdoctorat - UdeM

Co-superviseur⋅e :

Leon Hetzel

Visiteur de recherche indépendant - Technical University Munich (TUM)

Edward Hu

Doctorat - UdeM

Moksh Jain

Doctorat - UdeM

moksh.jain@mila.quebec

Stagiaire de recherche - UdeM

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Stagiaire de recherche - UdeM

Minsu Kim

Collaborateur·rice de recherche - UdeM

Doctorat - UdeM

Postdoctorat - UdeM

Doctorat - UdeM

Collaborateur·rice alumni

Seanie Lee

Collaborateur·rice alumni - UdeM

Zhen Liu

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Liam Paull

Chenghao Liu

Collaborateur·rice alumni

Stagiaire de recherche - Imperial College London

Doctorat - UdeM

Stagiaire de recherche - UdeM

Nikolay Malkin

Collaborateur·rice alumni - UdeM

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Postdoctorat - UdeM

Collaborateur·rice alumni

Sören Mindermann

Collaborateur·rice de recherche - UdeM

Sarthak Mittal

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant - UdeM

Ling Pan

Visiteur de recherche indépendant - Hong Kong University of Science and Technology (HKUST)

Ali Parviz

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Lena Podina

Doctorat - University of Waterloo

Superviseur⋅e principal⋅e :

Nassim Rahaman

Doctorat - Max-Planck-Institute for Intelligent Systems

Jarrid Rector-Brooks

Doctorat - UdeM

Co-superviseur⋅e :

Sarath Chandar

Danyal REHMAN

Postdoctorat - UdeM

James Requeima

Visiteur de recherche indépendant - UdeM

Postdoctorat - UdeM

Jessie Richter-Powell

Visiteur de recherche indépendant - UdeM

Camille Rochefort-Boulanger

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Julie Hussin

agassoussisalwane2@gmail.com

Salwane Salwane

Stagiaire de recherche - UdeM

Theo Saulus

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Postdoctorat - UdeM

Maîtrise recherche - UdeM

Marcin Sendera

Stagiaire de recherche - UdeM

Dounia Shaaban Kabakibo

Stagiaire de recherche - UdeM

Vedant Shah

Maîtrise recherche - UdeM

Collaborateur·rice alumni

Marco Stock

Visiteur de recherche indépendant - Technical University of Munich

marco.stock@tum.de

Anja Surina

Doctorat - École Polytechnique Fédérale de Lausanne

Vincent Taboga

Postdoctorat - Polytechnique

Co-superviseur⋅e :

Pierre-Luc Bacon

Mélisande Astrid Crystal Teng

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

alexander.tong@mila.quebec

Alex Tong

Postdoctorat - UdeM

Collaborateur·rice de recherche - Valence

Superviseur⋅e principal⋅e :

Dominique Beaini

Donna Vakalis

Postdoctorat - UdeM

Co-superviseur⋅e :

Viktor Viktor Todosijevic

Collaborateur·rice de recherche - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)

Superviseur⋅e principal⋅e :

Sasha Volokhova

Doctorat - UdeM

Zichao Yan

Collaborateur·rice alumni - UdeM

Kyle YUN

Collaborateur·rice de recherche - KAIST

Elmimouni Zakaria

Stagiaire de recherche - UdeM

Nicole Zhang

Doctorat - McGill

Superviseur⋅e principal⋅e :

Mathieu Blanchette

Dinghuai Zhang

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Aaron Courville

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Ruixiang Zhang

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Harry Zhao

Doctorat - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

Scaling in the service of reasoning & model-based ML

4 avril 2023

par

Yoshua Bengio

Edward J. Hu

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

par

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

Les réseaux de flot génératifs

15 mars 2022

par

Yoshua Bengio

Publications

Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

Kartik Ahuja

Ethan Caballero

Dinghuai Zhang

Jean-Christophe Gagnon-Audet

Ioannis Mitliagkas

The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address… (voir plus) out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.

openreview.net

Inductive biases for deep learning of higher-level cognition

Anirudh Goyal

A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopaedic list of … (voir plus)heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behaviour of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans’ abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.

2020-11-30

ArXiv (preprint)

doi.org

Revisiting Fundamentals of Experience Replay

William Fedus

Prajit Ramachandran

Rishabh Agarwal

Hugo Larochelle

Mark Rowland

Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understa… (voir plus)nding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

Alex Lamb

Anirudh Goyal

A. Slowik

Michael Curtis Mozer

Philippe Beaudoin

Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (voir plus)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.

2020-10-15

ArXiv (preprint)

COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing

Prateek Gupta

Tegan Maharaj

Martin Weiss

Nasim Rahaman

Hannah Alsdurf

abhinav sharma

Nanor Minoyan

Soren Harnois-Leblanc

Victor Schmidt

Pierre-Luc St-Charles

Tristan Deleu

andrew williams

Akshay Patel

Meng Qu

Olexa Bilaniuk

gaetan caron

pierre luc carrier

satya ortiz gagne

Marc-Andre Rousseau

David Buckeridge … (voir 9 de plus)

Joumana Ghosn

Yang Zhang

Bernhard Schölkopf

Jian Tang

Chris Pal

Joanna Merckx

Eilif Benjamin Muller

2020-10-02

OpenReview.net/Anonymous_Preprint (inconnu)

openreview.net

A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM

Iulian V. Serban

Varun Gupta

Ekaterina Kochmar

Dung D. Vu

Robert Belfer

2020-06-10

Artificial Intelligence in Education (publié)

doi.org

An Analysis of the Adaptation Speed of Causal Models

Rémi LE PRIOL

Reza Babanezhad Harikandeh

Simon Lacoste-Julien

We consider the problem of discovering the causal process that generated a collection of datasets. We assume that all these datasets were ge… (voir plus)nerated by unknown sparse interventions on a structural causal model (SCM)

2020-05-18

ArXiv (preprint)

COVI White Paper

Hannah Alsdurf

Tristan Deleu

Prateek Gupta

Daphne Ippolito

Richard Janda

Max Jarvie

Tyler J. Kolody

Sekoul Krastev

Tegan Maharaj

Robert Obryk

Dan Pilat

Valerie Pisano

Benjamin Prud'homme

Meng Qu

Nasim Rahaman

Jean-franois Rousseau

abhinav sharma

Brooke Struck … (voir 3 de plus)

Jian Tang

Martin Weiss

Yun William Yu

2020-05-18

ArXiv (prépublication)

Meta-learning framework with applications to zero-shot time-series forecasting

Boris Oreshkin

Dmitri Carpov

Nicolas Chapados

Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new… (voir plus) TS coming from different datasets? This work provides positive evidence to this using a broad meta-learning framework which we show subsumes many existing meta-learning algorithms. Our theoretical analysis suggests that residual connections act as a meta-learning adaptation mechanism, generating a subset of task-specific parameters based on a given TS input, thus gradually expanding the expressive power of the architecture on-the-fly. The same mechanism is shown via linearization analysis to have the interpretation of a sequential update of the final linear layer. Our empirical results on a wide range of data emphasize the importance of the identified meta-learning mechanisms for successful zero-shot univariate forecasting, suggesting that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining, resulting in performance that is at least as good as that of state-of-practice univariate forecasting models.

2020-02-07

ArXiv (prépublication)

doi.org

COVI White Paper-Version 1.1

Hannah Alsdurf

Tristan Deleu

Prateek Gupta

Daphne Ippolito

Richard Janda

Max Jarvie

Tyler J. Kolody

Sekoul Krastev

Tegan Maharaj

Robert Obryk

Dan Pilat

Valerie Pisano

Benjamin Prud'homme

Meng Qu

Nasim Rahaman

Jean-franois Rousseau

abhinav sharma

Brooke Struck … (voir 3 de plus)

Jian Tang

Martin Weiss

Yun William Yu

The SARS-CoV-2 (Covid-19) pandemic has resulted in significant strain on health care and public health institutions around the world. Contac… (voir plus)t tracing is an essential tool for public health officials and local communities to change the course of the Covid-19 pandemic. Standard manual contact tracing of people infected with Covid-19, while the current gold standard, has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through the use of mobile applications has the potential to shift the paradigm of Covid-19 community spread. Although some countries have deployed centralized tracking systems through either GPS or Bluetooth, more privacy-protecting decentralized systems offer much of the same benefit without concentrating data in the hands of a state authority or in for-profit corporations. Additionally, machine learning methods can be used to circumvent some of the limitations of standard digital tracing by incorporating many clues (including medical conditions, self-reported symptoms, and numerous encounters with people at different risk levels, for different durations and distances) and their uncertainty into a more graded and precise estimation of infection and contagion risk. The estimated risk can be used to provide early risk awareness, personalized recommendations and relevant information to the user and connect them to health services. Finally, the non-identifying data about these risks can inform detailed epidemiological models trained jointly with the machine learning predictor, and these models can provide statistical evidence for the interaction and importance of different factors involved in the transmission of the disease. They can also be used to monitor, evaluate and optimize different health policy and confinement/deconfinement scenarios according to medical and economic productivity indicators. However, such a strategy based on mobile apps and machine learning should proactively mitigate potential ethical and privacy risks, which could have substantial impacts on society (not only impacts on health but also impacts such as stigmatization and abuse of personal data). Here, we present an overview of the rationale, design, ethical considerations and privacy strategy of ‘COVI,’ a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada. Addendum 2020-07-14: The government of Canada has declined to endorse COVI and will be promoting a different app for decentralized contact tracing. In the interest of preventing fragmentation of the app landscape, COVI will therefore not be deployed to end users. We are currently still in the process of finalizing the project, and plan to release our code and models for academic consumption and to make them accessible to other States should they wish to deploy an app based on or inspired by said code and models. University of Ottawa, Mila, Université de Montréal, The Alan Turing Institute, University of Oxford, University of Pennsylvania, McGill University, Borden Ladner Gervais LLP, The Decision Lab, HEC Montréal, Max Planck Institute, Libéo, University of Toronto. Corresponding author general: richard.janda@mcgill.ca Corresponding author for public health: abhinav.sharma@mcgill.ca Corresponding author for privacy: ywyu@math.toronto.edu Corresponding author for machine learning: yoshua.bengio@mila.quebec Corresponding author for user perspective: brooke@thedecisionlab.com Corresponding author for technical implementation: jean-francois.rousseau@libeo.com 1 ar X iv :2 00 5. 08 50 2v 2 [ cs .C R ] 2 7 Ju l 2 02 0

2020-01-01

(publié)

www.semanticscholar.org

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

Boris Oreshkin

Dmitri Carpov

Nicolas Chapados

We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based o… (voir plus)n backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy.

2020-01-01

ICLR (publié)

openreview.net

On the interplay between noise and curvature and its effect on optimization and generalization

Valentin Thomas

Fabian Pedregosa

Bart van Merriënboer

Pierre-Antoine Manzagol

Nicolas Le Roux

The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the v… (voir plus)ariance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.

2020-01-01

AISTATS (publié)

proceedings.mlr.press