Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle
Directeur scientifique, Équipe de direction
Observateur, Conseil d'administration, Mila

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Julie Mongeau, adjointe de direction à julie.mongeau@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et directeur scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de directeur scientifique d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Maîtrise professionnelle - Université de Montréal
Co-superviseur⋅e :
Maîtrise professionnelle - Université de Montréal
Doctorat - Université de Montréal
Postdoctorat - Université de Montréal
Co-superviseur⋅e :
Postdoctorat - Université de Montréal
Doctorat - Université de Montréal
Collaborateur·rice de recherche - Université Paris-Saclay
Superviseur⋅e principal⋅e :
Maîtrise professionnelle - Université de Montréal
Visiteur de recherche indépendant - MIT
Doctorat - École Polytechnique Montréal Fédérale de Lausanne
Stagiaire de recherche - Université du Québec à Rimouski
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Postdoctorat - Université de Montréal
Co-superviseur⋅e :
Maîtrise professionnelle - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Barcelona University
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Postdoctorat - Université de Montréal
Co-superviseur⋅e :
Maîtrise recherche - Université de Montréal
Doctorat - Université de Montréal
Stagiaire de recherche - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Stagiaire de recherche - UQAR
Collaborateur·rice alumni
Visiteur de recherche indépendant - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Stagiaire de recherche - McGill University
Visiteur de recherche indépendant - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Co-superviseur⋅e :
Maîtrise professionnelle - Université de Montréal
Stagiaire de recherche - Université de Montréal
Doctorat - Université de Montréal
Doctorat - Massachusetts Institute of Technology
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Visiteur de recherche indépendant - Technical University Munich (TUM)
Visiteur de recherche indépendant - Hong Kong University of Science and Technology (HKUST)
DESS - Université de Montréal
Visiteur de recherche indépendant - UQAR
Postdoctorat - Université de Montréal
Doctorat - Université de Montréal
Stagiaire de recherche - Université de Montréal
Visiteur de recherche indépendant - Technical University of Munich
Stagiaire de recherche - Imperial College London
Doctorat - Université de Montréal
Co-superviseur⋅e :
Postdoctorat - Université de Montréal
Doctorat - McGill University
Superviseur⋅e principal⋅e :
Maîtrise professionnelle - Université de Montréal
Collaborateur·rice de recherche - Université de Montréal
Stagiaire de recherche - Université de Montréal
Stagiaire de recherche - Université de Montréal
Doctorat - Université de Montréal
Doctorat - Max-Planck-Institute for Intelligent Systems
Doctorat - McGill University
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - Université de Montréal
Maîtrise professionnelle - Université de Montréal
Doctorat - Université de Montréal
Visiteur de recherche indépendant - Université de Montréal
Collaborateur·rice alumni - Université de Montréal
Collaborateur·rice de recherche
Maîtrise professionnelle - Université de Montréal
Collaborateur·rice de recherche - Valence
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Stagiaire de recherche - Université de Montréal
Collaborateur·rice de recherche - Université de Montréal
Visiteur de recherche indépendant
Co-superviseur⋅e :
Postdoctorat - Université de Montréal
Stagiaire de recherche - McGill University
Maîtrise professionnelle - Université de Montréal
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Maîtrise recherche - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Maîtrise recherche - Université de Montréal
Doctorat - Université de Montréal
Collaborateur·rice de recherche - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Superviseur⋅e principal⋅e :
Baccalauréat - Université de Montréal
Doctorat - Université de Montréal
Maîtrise professionnelle - Université de Montréal
Maîtrise professionnelle - Université de Montréal
Stagiaire de recherche - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Maîtrise professionnelle - Université de Montréal
Postdoctorat - Université de Montréal

Publications

NEURAL NETWORK-BASED SOLVERS FOR PDES
M. Cameron
Ian G Goodfellow
(1) N (x; θ) = Ll+1 ○ σl ○Ll ○ σl−1 ○ . . . ○ σ1 ○L1. The symbol Lk denotes the k’s affine operator of the form Lk(x) = … (voir plus)Akx + bk, while σk denotes a nonlinear function called an activation function. The activation functions are chosen by the user. The matrices Ak and shift vectors (or bias vectors) bk are encoded into the argument θ: θ = {Ak, bk} l+1 k=1. The term training neural network means finding {Ak, bk} l+1 k=1 such that N (x; θ) satisfies certain conditions. These conditions are described by the loss function chosen by the user. For example, one might want the neural network to assume certain values fj at certain points xj , j = 1, . . . ,N . These points x are called the training data. In this case, a common choice of the loss function is the least squares error:
Stochastic Generative Flow Networks
Ling Pan
Dinghuai Zhang
Moksh J. Jain
Longbo Huang
Stochastic Generative Flow Networks
Ling Pan
Dinghuai Zhang
Moksh J. Jain
Longbo Huang
Generative Flow Networks (or GFlowNets for short) are a family of probabilistic agents that learn to sample complex combinatorial structures… (voir plus) through the lens of ``inference as control''. They have shown great potential in generating high-quality and diverse candidates from a given energy landscape. However, existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with stochastic dynamics, which can limit their applicability. To overcome this challenge, this paper introduces Stochastic GFlowNets, a new algorithm that extends GFlowNets to stochastic environments. By decomposing state transitions into two steps, Stochastic GFlowNets isolate environmental stochasticity and learn a dynamics model to capture it. Extensive experimental results demonstrate that Stochastic GFlowNets offer significant advantages over standard GFlowNets as well as MCMC- and RL-based approaches, on a variety of standard benchmarks with stochastic dynamics.
Supplementary Material for MixupE
Yingtian Zou
Vikas Verma
Sarthak Mittal
Wai Hoh Tang
Hieu Pham
Juho Kannala
Arno Solin
Kenji Kawaguchi
We denote by z = (x,y) the input and output pair where x ∈ X ⊆ R and y ∈ Y ⊆ R . Let fθ(x) ∈ R be the output of the logits (i.e.,… (voir plus) the last layer before the softmax or sigmoid) of the model parameterized by θ. We use l(θ, z) = h(fθ(x)) − yfθ(x) to denote the loss function. Let g(·) be the activation function. We use x(i) to index i-th element of the vector x and xj to represent j-th variable in a set. The notation list is:
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning
Sébastien Lachapelle
Tristan Deleu
Divyat Mahajan
Quentin Bertrand
Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding … (voir plus)is limited. In this work, we provide evidence that disentangled representations coupled with sparse task-specific predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maximally sparse predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem. Finally, we explore a meta-learning version of this algorithm based on group Lasso multiclass SVM predictors, for which we derive a tractable dual formulation. It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning
Sébastien Lachapelle
Tristan Deleu
Divyat Mahajan
Quentin Bertrand
Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding … (voir plus)is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem. Finally, we explore a meta-learning version of this algorithm based on group Lasso multiclass SVM base-predictors, for which we derive a tractable dual formulation. It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.
A theory of continuous generative flow networks
Salem Lahlou
Tristan Deleu
Pablo Lemos
Dinghuai Zhang
Alexandra Volokhova
Alex Hernandez-Garcia
Lena Nehale Ezzine
Nikolay Malkin
A theory of continuous generative flow networks
Salem Lahlou
Tristan Deleu
Pablo Lemos
Dinghuai Zhang
Alexandra Volokhova
Alex Hernandez-Garcia
Lena Nehale Ezzine
Nikolay Malkin
Generative flow networks (GFlowNets) are amortized variational inference algorithms that are trained to sample from unnormalized target dist… (voir plus)ributions over compositional objects. A key limitation of GFlowNets until this time has been that they are restricted to discrete spaces. We present a theory for generalized GFlowNets, which encompasses both existing discrete GFlowNets and ones with continuous or hybrid state spaces, and perform experiments with two goals in mind. First, we illustrate critical points of the theory and the importance of various assumptions. Second, we empirically demonstrate how observations about discrete GFlowNets transfer to the continuous case and show strong results compared to non-GFlowNet baselines on several previously studied tasks. This work greatly widens the perspectives for the application of GFlowNets in probabilistic inference and various modeling settings.
Bayesian learning of Causal Structure and Mechanisms with GFlowNets and Variational Bayes
Mizu Nishikawa-Toomey
Tristan Deleu
Jithendaraa Subramanian
Bayesian causal structure learning aims to learn a posterior distribution over directed acyclic graphs (DAGs), and the mechanisms that defin… (voir plus)e the relationship between parent and child variables. By taking a Bayesian approach, it is possible to reason about the uncertainty of the causal model. The notion of modelling the uncertainty over models is particularly crucial for causal structure learning since the model could be unidentifiable when given only a finite amount of observational data. In this paper, we introduce a novel method to jointly learn the structure and mechanisms of the causal model using Variational Bayes, which we call Variational Bayes-DAG-GFlowNet (VBG). We extend the method of Bayesian causal structure learning using GFlowNets to learn not only the posterior distribution over the structure, but also the parameters of a linear-Gaussian model. Our results on simulated data suggest that VBG is competitive against several baselines in modelling the posterior over DAGs and mechanisms, while offering several advantages over existing methods, including the guarantee to sample acyclic graphs, and the flexibility to generalize to non-linear causal mechanisms.
Inductive biases for deep learning of higher-level cognition
Anirudh Goyal
Lookback for Learning to Branch
Prateek Gupta
Elias Boutros Khalil
Didier Chételat
M. Pawan Kumar
Towards Scaling Difference Target Propagation by Learning Backprop Targets
Maxence Ernoult
Fabrice Normandin
Abhinav Moudgil
Sean Spinney
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (voir plus) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32