Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal

Chaire en IA Canada-CIFAR

Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle

Fondateur et Conseiller scientifique, Équipe de direction

Sujets de recherche

Apprentissage automatique médical

Apprentissage de représentations

Apprentissage par renforcement

Apprentissage profond

Causalité

Modèles génératifs

Modèles probabilistes

Modélisation moléculaire

Neurosciences computationnelles

Raisonnement

Réseaux de neurones en graphes

Réseaux de neurones récurrents

Théorie de l'apprentissage automatique

Traitement du langage naturel

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Cassidy MacNeil, adjointe principale et responsable des opérations cassidy.macneil@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Collaborateur·rice alumni - McGill

Collaborateur·rice de recherche - Cambridge University

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Visiteur de recherche indépendant

Co-superviseur⋅e :

Guillaume Lajoie

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche - KAIST

Aniket Didolkar

Doctorat - UdeM

Abdessamad EL KABID

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Loubna Benabbou

Desmond Elliott

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Guillaume Lajoie

Doctorat - UdeM

Jean-Pierre Falet

Doctorat - UdeM

Doctorat

Doctorat - UdeM

Doctorat - UdeM

Thomas Jiralerspong

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Guillaume Lajoie

Younesse Kaddar

Collaborateur·rice alumni - UdeM

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Alex Hernández-García

Tabitha Edith Lee

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni

Collaborateur·rice alumni - UdeM

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Guillaume Lajoie

Visiteur de recherche indépendant - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Collaborateur·rice de recherche - University of Waterloo

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems

Collaborateur·rice de recherche - UdeM

Co-superviseur⋅e :

Loubna Benabbou

Jarrid Rector-Brooks

Doctorat - UdeM

Postdoctorat - UdeM

Postdoctorat - UdeM

Camille Rochefort-Boulanger

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Dragos Secrieru

Collaborateur·rice alumni - UdeM

Postdoctorat

Co-superviseur⋅e :

Alex Hernández-García

Collaborateur·rice alumni - Polytechnique

Co-superviseur⋅e :

Pierre-Luc Bacon

Mélisande Astrid Crystal Teng

Doctorat - UdeM

Co-superviseur⋅e :

Hugo Larochelle

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - UdeM

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Siddarth Venkatraman

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche

Collaborateur·rice de recherche - UdeM

Doctorat - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Mathieu Blanchette

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Aaron Courville

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Scaling in the service of reasoning & model-based ML

4 avril 2023

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

par

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

par

Jake P. Taylor-King

Generative Flow Networks

15 mars 2022

Les réseaux de flot génératifs

par

Publications

Bayesian Structure Learning with Generative Flow Networks

Simon Lacoste-Julien

In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian ne… (voir plus)tworks, from data. Defining such a distribution is very challenging, due to the combinatorially large sample space, and approximations based on MCMC are often required. Recently, a novel class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling of discrete and composite objects, such as graphs. In this work, we propose to use a GFlowNet as an alternative to MCMC for approximating the posterior distribution over the structure of Bayesian networks, given a dataset of observations. Generating a sample DAG from this approximate distribution is viewed as a sequential decision problem, where the graph is constructed one edge at a time, based on learned transition probabilities. Through evaluation on both simulated and real data, we show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs, and it compares favorably against other methods based on MCMC or variational inference.

2022-05-19

auai.org/UAI/2022/Conference (poster)

proceedings.mlr.press

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

Marlos C. Machado

Mingde Zhao

Sainbayar Sukhbaatar

Alessandro Lazaric

Ludovic Denoyer

In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from… (voir plus) skill discovery to reward shaping. Recently, learning the Laplacian representation has been framed as the optimization of a temporally-contrastive objective to overcome its computational limitations in large (or continuous) state spaces. However, this approach requires uniform access to all states in the state space, overlooking the exploration problem that emerges during the representation learning process. In this work, we propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation. We do so by combining the representation learning with a skill-based covering policy, which provides a better training distribution to extend and refine the representation. We also show that a simple augmentation of the representation objective with the learned temporal abstractions improves dynamics-awareness and helps exploration. We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments. Finally, even if our method is not optimized for skill discovery, the learned skills can successfully solve difficult continuous navigation tasks with sparse rewards, where standard skill discovery approaches are no so effective.

2022-05-19

auai.org/UAI/2022/Conference (poster)

proceedings.mlr.press

FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for Federated Learning on Non-IID Data

Mike He Zhu

Lena Nehale Ezzine

Dianbo Liu

2022-05-18

ArXiv (prépublication)

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Max Morrison

Prem Seetharaman

Aaron Courville

Conditional waveform synthesis models learn a distribution of audio waveforms given conditioning such as text, mel-spectrograms, or MIDI. Th… (voir plus)ese systems employ deep generative models that model the waveform via either sequential (autoregressive) or parallel (non-autoregressive) sampling. Generative adversarial networks (GANs) have become a common choice for non-autoregressive waveform synthesis. However, state-of-the-art GAN-based models produce artifacts when performing mel-spectrogram inversion. In this paper, we demonstrate that these artifacts correspond with an inability for the generator to learn accurate pitch and periodicity. We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression. We discuss the inductive bias that autoregression provides for learning the relationship between instantaneous frequency and phase, and show that this inductive bias holds even when autoregressively sampling large chunks of the waveform during each forward pass. Relative to prior state-of-the-art GAN-based models, our proposed model, Chunked Autoregressive GAN (CARGAN) reduces pitch error by 40-60%, reduces training time by 58%, maintains a fast generation speed suitable for real-time or interactive applications, and maintains or improves subjective quality.

2022-04-24

International Conference on Learning Representations (Accept (Poster))

Compositional Attention: Disentangling Search and Retrieval

Sharath Chandra Raparthy

Guillaume Lajoie

Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses m… (voir plus)ultiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Importantly, standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner through an additional soft competition stage between the query-key combination and value pairing. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval, and can easily be implemented in lieu of standard attention heads in any network architecture.

2022-04-24

International Conference on Learning Representations (Accept (Spotlight))

RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software

Cheng-Hao Liu

Maksym Korablyov

Stanisław Jastrzębski

Paweł Włodarczyk-Pruszyński

Marwin Segler

2022-04-21

Journal of Chemical Information and Modeling (publié)

E VALUATING G ENERALIZATION IN GF LOW N ETS FOR M OLECULE D ESIGN

Andrei Cristian Nica

Moksh J. Jain

Emmanuel Bengio

Cheng-Hao Liu

Maksym Korablyov

Michael M. Bronstein

Deep learning bears promise for drug discovery problems such as de novo molecular design. Generating data to train such models is a costly a… (voir plus)nd time-consuming process, given the need for wet-lab experiments or expensive simulations. This problem is compounded by the notorious data-hungriness of machine learning algorithms. In small molecule generation the recently proposed GFlowNet method has shown good performance in generating diverse high-scoring candidates, and has the interesting advantage of being an off-policy offline method. Finding an appropriate generalization evaluation metric for such models, one predictive of the desired search performance (i.e. finding high-scoring diverse candidates), will help guide online data collection for such an algorithm. In this work, we develop techniques for evaluating GFlowNet performance on a test set, and identify the most promising metric for predicting generalization. We present empirical results on several small-molecule design tasks in drug discovery, for several GFlowNet training setups, and we find a metric strongly correlated with diverse high-scoring batch generation. This metric should be used to identify the best generative model from which to sample batches of molecules to be evaluated.

2022-04-04

ICLR.cc/2022/Workshop/MLDD (poster)

Inductive Biases for Relational Tasks

Blake Aaron Richards

Guillaume Lajoie

Current deep learning approaches have shown good in-distribution performance but struggle in out-of-distribution settings. This is especiall… (voir plus)y true in the case of tasks involving abstract relations like recognizing rules in sequences, as required in many intelligence tests. In contrast, our brains are remarkably flexible at such tasks, an attribute that is likely linked to anatomical constraints on computations. Inspired by this, recent work has explored how enforcing that relational representations remain distinct from sensory representations can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by ``partitioned'' representations of relations and sensory details. We investigate inductive biases that ensure abstract relations are learned and represented distinctly from sensory data across several neural network architectures and show that they outperform existing architectures on out-of-distribution generalization for various relational tasks. These results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing relational computations.

2022-03-24

ICLR.cc/2022/Workshop/OSC (poster)

Object-centric Compositional Imagination for Visual Abstract Reasoning

Pau Rodríguez

Perouz Taslakian

Like humans devoid of imagination, current machine learning systems lack the ability to adapt to new, unexpected situations by foreseeing th… (voir plus)em, which makes them unable to solve new tasks by analogical reasoning. In this work, we introduce a new compositional imagination framework that improves a model's ability to generalize. One of the key components of our framework is object-centric inductive biases that enables models to perceive the environment as a series of objects, properties, and transformations. By composing these key ingredients, it is possible to generate new unseen tasks that, when used to train the model, improve generalization. Experiments on a simplified version of the Abstraction and Reasoning Corpus (ARC) demonstrate the effectiveness of our framework.

2022-03-24

ICLR.cc/2022/Workshop/OSC (poster)

A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions

Francois St-Hilaire

Dung D. Vu

Antoine Frau

Nathan J. Burns

Farid Faraji

Joseph Potochny

Stephane Robert

Arnaud Roussel

Selene Zheng

Taylor Glazier

Junfel Vincent Romano

Robert Belfer

Muhammad Shayan

Ariella Smofsky

Tommy Delarosbil

Seulmin Ahn

Simon Eden-Walker

Kritika Sony

Ansona Onyi Ching

Sabina Elkins … (voir 11 de plus)

A. Stepanyan

Adela Matajova

Victor Chen

Hossein Sahraei

Robert Larson

N. Markova

Andrew Barkett

Laurent Charlin

Iulian V. Serban

Ekaterina Kochmar

2022-03-02

ArXiv (prépublication)

RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro

Jarrid Rector-Brooks

Thomas Gaudelet

Andrew Anighoro

Torsten Gross

Francisco Martínez-Peña

Eileen L. Tang

S. SurajM

Cristian Regep

Jeremy B.R. Hayter

Maksym Korablyov

N. Valiante

Almer M. van der Sloot

Mike Tyers

Charles E.S. Roberts

Michael M. Bronstein

Luke Lee Lairson

Jake P. Taylor-King

2022-02-06

ArXiv (prépublication)

Tackling Climate Change with Machine Learning

Priya L. Donti

Lynn H. Kaack

Kelly Kochanski

Alexandre Lacoste

Andrew Slavin Ross

Nikola Milojevic-Dupont

Natasha Jaques

Anna Waldman-Brown

Alexandra Luccioni

Evan D. Sherwin

S. Karthik Mukkavilli

Konrad P. Kording

Carla Gomes

Andrew Y. Ng

Demis Hassabis

John C. Platt

Felix Creutzig … (voir 2 de plus)

Jennifer Chayes

Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we d… (voir plus)escribe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change.

2022-02-06

ACM Computing Surveys (publié)