Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal

Chaire en IA Canada-CIFAR

Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle

Fondateur et Conseiller scientifique, Équipe de direction

Sujets de recherche

Apprentissage automatique médical

Apprentissage de représentations

Apprentissage par renforcement

Apprentissage profond

Causalité

Modèles génératifs

Modèles probabilistes

Modélisation moléculaire

Neurosciences computationnelles

Raisonnement

Réseaux de neurones en graphes

Réseaux de neurones récurrents

Théorie de l'apprentissage automatique

Traitement du langage naturel

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Cassidy MacNeil, adjointe principale et responsable des opérations cassidy.macneil@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Collaborateur·rice alumni - McGill

Collaborateur·rice de recherche - Cambridge University

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Visiteur de recherche indépendant

Co-superviseur⋅e :

Guillaume Lajoie

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche - KAIST

Aniket Didolkar

Doctorat - UdeM

Abdessamad EL KABID

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Loubna Benabbou

Desmond Elliott

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Guillaume Lajoie

Doctorat - UdeM

Jean-Pierre Falet

Doctorat - UdeM

Doctorat

Doctorat - UdeM

Doctorat - UdeM

Thomas Jiralerspong

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Guillaume Lajoie

Younesse Kaddar

Collaborateur·rice alumni - UdeM

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Alex Hernández-García

Tabitha Edith Lee

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni

Collaborateur·rice alumni - UdeM

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Guillaume Lajoie

Visiteur de recherche indépendant - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Collaborateur·rice de recherche - University of Waterloo

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems

Collaborateur·rice de recherche - UdeM

Co-superviseur⋅e :

Loubna Benabbou

Jarrid Rector-Brooks

Doctorat - UdeM

Postdoctorat - UdeM

Postdoctorat - UdeM

Camille Rochefort-Boulanger

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Dragos Secrieru

Collaborateur·rice alumni - UdeM

Postdoctorat

Co-superviseur⋅e :

Alex Hernández-García

Collaborateur·rice alumni - Polytechnique

Co-superviseur⋅e :

Pierre-Luc Bacon

Mélisande Astrid Crystal Teng

Doctorat - UdeM

Co-superviseur⋅e :

Hugo Larochelle

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - UdeM

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Siddarth Venkatraman

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche

Collaborateur·rice de recherche - UdeM

Doctorat - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Mathieu Blanchette

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Aaron Courville

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Scaling in the service of reasoning & model-based ML

4 avril 2023

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

par

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

par

Jake P. Taylor-King

Generative Flow Networks

15 mars 2022

Les réseaux de flot génératifs

par

Publications

Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives

Xue Bin Peng

Sergey Levine

Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavi… (voir plus)or. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decompose a policy into lower-level primitives or options, and a higher-level meta-policy that triggers the appropriate behaviors for a given situation. However, the meta-policy must still produce appropriate decisions in all states. In this work, we propose a policy design that decomposes into primitives, similarly to hierarchical reinforcement learning, but without a high-level meta-policy. Instead, each primitive can decide for themselves whether they wish to act in the current state. We use an information-theoretic mechanism for enabling this decentralized decision: each primitive chooses how much information it needs about the current state to make a decision and the primitive that requests the most information about the current state acts in the world. The primitives are regularized to use as little information as possible, which leads to natural competition and specialization. We experimentally demonstrate that this policy architecture improves over both flat and hierarchical policies in terms of generalization.

2019-12-31

ICLR.cc/2020/Conference (poster)

Small-GAN: Speeding Up GAN Training Using Core-Sets

Samarth Sinha

Han Zhang

Hugo Larochelle

Augustus Odena

Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch siz… (voir plus)es. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this work, we propose a method to do this, inspired by the use of Coreset-selection in active learning. When training a GAN, we draw a large batch of samples from the prior and then compress that batch using Coreset-selection. To create effectively large batches of 'real' images, we create a cached dataset of Inception activations of each training image, randomly project them down to a smaller dimension, and then use Coreset-selection on those projected activations at training time. We conduct experiments showing that this technique substantially reduces training time and memory usage for modern GAN variants, that it reduces the fraction of dropped modes in a synthetic dataset, and that it allows GANs to reach a new state of the art in anomaly detection.

2019-12-31

ICML (publié)

proceedings.mlr.press

Systematicity in a Recurrent Neural Network by Factorizing Syntax and Semantics

Jacob Russin

R. O’Reilly

Standard methods in deep learning fail to capture compositional or systematic structure in their training data, as shown by their inability … (voir plus)to generalize outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. The inductive biases that might underlie this powerful cognitive capacity remain unclear. Inspired by work in cognitive science suggesting a functional distinction between systems for syntactic and semantic processing, we implement a modiﬁcation to an existing deep learning architecture, imposing an analogous separation. The resulting architecture substantially out-performs standard recurrent networks on the SCAN dataset, a compositional generalization task, without any additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure, and highlights the potential of using cognitive principles to inform inductive biases in deep learning.

2019-12-31

CogSci (publié)

dblp.uni-trier.de

On the interplay between noise and curvature and its effect on optimization and generalization

Valentin Thomas

Fabian Pedregosa

Bart Van Merriënboer

Pierre-Antoine Mangazol

Nicolas Le Roux

The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the v… (voir plus)ariance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.

2019-12-31

AISTATS (publié)

proceedings.mlr.press

On the Morality of Artificial Intelligence

Alexandra Luccioni

Much of the existing research on the social and ethical impact of Artificial Intelligence has been focused on defining ethical principles an… (voir plus)d guidelines surrounding Machine Learning (ML) and other Artificial Intelligence (AI) algorithms [IEEE, 2017, Jobin et al., 2019]. While this is extremely useful for helping define the appropriate social norms of AI, we believe that it is equally important to discuss both the potential and risks of ML and to inspire the community to use ML for beneficial objectives. In the present article, which is specifically aimed at ML practitioners, we thus focus more on the latter, carrying out an overview of existing high-level ethical frameworks and guidelines, but above all proposing both conceptual and practical principles and guidelines for ML research and deployment, insisting on concrete actions that can be taken by practitioners to pursue a more ethical and moral practice of ML aimed at using AI for social good.

2019-12-31

IEEE Technology and Society Magazine (publié)

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

Matthew Botvinick

Sergey Levine

In many applications, it is desirable to extract only the relevant information from complex input data, which involves making a decision abo… (voir plus)ut which input features are relevant. The information bottleneck method formalizes this as an information-theoretic optimization problem by maintaining an optimal tradeoff between compression (throwing away irrelevant input information), and predicting the target. In many problem settings, including the reinforcement learning problems we consider in this work, we might prefer to compress only part of the input. This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent. In such cases, we might prefer to compress the privileged input, either to achieve better generalization (e.g., with respect to goals) or to minimize access to costly information (e.g., in the case of communication). Practical implementations of the information bottleneck based on variational inference require access to the privileged input in order to compute the bottleneck variable, so although they perform compression, this compression operation itself needs unrestricted, lossless access. In this work, we propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information before seeing it, i.e., only based on the standard input, and then accordingly chooses stochastically, whether to access the privileged input or not. We formulate a tractable approximation to this framework and demonstrate in a series of reinforcement learning experiments that it can improve generalization and reduce access to computationally costly information.

2019-12-31

ICLR (publié)

Toward Training Recurrent Neural Networks for Lifelong Learning.

Catastrophic forgetting and capacity saturation are the central challenges of any parametric lifelong learning system. In this work, we stud… (voir plus)y these challenges in the context of sequential supervised learning with an emphasis on recurrent neural networks. To evaluate the models in the lifelong learning setting, we propose a curriculum-based, simple, and intuitive benchmark where the models are trained on tasks with increasing levels of difficulty. To measure the impact of catastrophic forgetting, the model is tested on all the previous tasks as it completes any task. As a step toward developing true lifelong learning systems, we unify gradient episodic memory (a catastrophic forgetting alleviation approach) and Net2Net (a capacity expansion approach). Both models are proposed in the context of feedforward networks, and we evaluate the feasibility of using them for recurrent networks. Evaluation on the proposed benchmark shows that the unified model is more suitable than the constituent models for lifelong learning setting.

2019-12-31

Neural Computation (publié)

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

Miles Brundage

Shahar Avin

Haydn Belfield

Gretchen Krueger

Gillian Hadfield

Heidy Khlaaf

Jingying Yang

Helen Toner

Ruth Fong

Pang Wei Koh

Sara Hooker

Jade Leung

Andrew Trask

Emma Bluemke

Jonathan Lebensold

Cullen O'Keefe

Mark Koren

Théo Ryffel … (voir 39 de plus)

JB Rubinovitz

Tamay Besiroglu

Federica Carugati

Jack Clark

Peter Eckersley

Sarah de Haas

Maritza Johnson

Ben Laurie

Alex Ingerman

Igor Krawczuk

Amanda Askell

Rosario Cammarota

Andrew Lohn

David Krueger

Charlotte Stix

Peter Henderson

Logan Graham

Carina Prunkl

Bianca Martin

Elizabeth Seger

Noa Zilberman

Seán Ó hÉigeartaigh

Frens Kroeger

Girish Sastry

Rebecca Kagan

Adrian Weller

Brian Tse

Elizabeth Barnes

Allan Dafoe

Paul Scharre

Ariel Herbert-Voss

Martijn Rasser

Carrick Flynn

Thomas Krendl Gilbert

Lisa Dyer

Saif Khan

Markus Anderljung

With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and … (voir plus)recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.

2019-12-31

arXiv (prépublication)

Université de Montréal Balancing Signals for Semi-Supervised Sequence Learning

Ya Xu

Christopher Pal

Aaron Courville

Training recurrent neural networks (RNNs) on long sequences using backpropagation through time (BPTT) remains a fundamental challenge. It ha… (voir plus)s been shown that adding a local unsupervised loss term into the optimization objective makes the training of RNNs on long sequences more effective. While the importance of an unsupervised task can in principle be controlled by a coefficient in the objective function, the gradients with respect to the unsupervised loss term still influence all the hidden state dimensions, which might cause important information about the supervised task to be degraded or erased. Compared to existing semi-supervised sequence learning methods, this thesis focuses upon a traditionally overlooked mechanism – an architecture with explicitly designed private and shared hidden units designed to mitigate the detrimental influence of the auxiliary unsupervised loss over the main supervised task. We achieve this by dividing the RNN hidden space into a private space for the supervised task or a shared space for both the supervised and unsupervised tasks. We present extensive experiments with the proposed framework on several long sequence modeling benchmark datasets. Results indicate that the proposed framework can yield performance gains in RNN models where long term dependencies are notoriously challenging to deal with.

2019-12-31

(publié)

www.semanticscholar.org

S UPPLEMENTARY M ATERIAL - L EARNING T O N AVIGATE T HE S YNTHETICALLY A CCESSIBLE C HEMICAL S PACE U SING R EINFORCEMENT L EARNING

Sai Krishna

Gottipati

B. Sattarov

Sufeng Niu

Yashaswi Pathak

Haoran Wei

Karam M. J. Thomas

Simon R. Blackburn

Connor Wilson. Coley

A. Chandar

While updating the critic network, we multiply the normal random noise vector with policy noise of 0.2 and then clip it in the range -0.2 to… (voir plus) 0.2. This clipped policy noise is added to the action at the next time step a′ computed by the target actor networks f and π. The actor networks (f and π networks), target critic and target actor networks are updated once every two updates to the critic network.

2019-12-31

(publié)

www.semanticscholar.org

Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

Tong Che

Jascha Sohl-Dickstein

Hugo Larochelle

Yuan Cao

We show that the sum of the implicit generator log-density …

2019-12-31

Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (publié)

Learning from Learning Machines: Optimisation, Rules, and Social Norms

There is an analogy between machine learning systems and economic entities in that they are both adaptive, and their behaviour is specified … (voir plus)in a more-or-less explicit way. It appears that the area of AI that is most analogous to the behaviour of economic entities is that of morally good decision-making, but it is an open question as to how precisely moral behaviour can be achieved in an AI system. This paper explores the analogy between these two complex systems, and we suggest that a clearer understanding of this apparent analogy may help us forward in both the socio-economic domain and the AI domain: known results in economics may help inform feasible solutions in AI safety, but also known results in AI may inform economic policy. If this claim is correct, then the recent successes of deep learning for AI suggest that more implicit specifications work better than explicit ones for solving such problems.

2019-12-28

ArXiv (prépublication)