Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle
Fondateur et Conseiller scientifique, Équipe de direction
Sujets de recherche
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Causalité
Modèles génératifs
Modèles probabilistes
Modélisation moléculaire
Neurosciences computationnelles
Raisonnement
Réseaux de neurones en graphes
Réseaux de neurones récurrents
Théorie de l'apprentissage automatique
Traitement du langage naturel

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Marie-Josée Beauchamp, adjointe administrative à marie-josee.beauchamp@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - UdeM
Collaborateur·rice de recherche - Cambridge University
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Visiteur de recherche indépendant
Co-superviseur⋅e :
Doctorat - UdeM
Visiteur de recherche indépendant
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - N/A
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Collaborateur·rice de recherche - KAIST
Collaborateur·rice alumni - UdeM
Collaborateur·rice alumni - UdeM
Co-superviseur⋅e :
Visiteur de recherche indépendant
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni
Collaborateur·rice alumni - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Collaborateur·rice alumni - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Ying Wu Coll of Computing
Collaborateur·rice de recherche - University of Waterloo
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems
Collaborateur·rice de recherche - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Postdoctorat - UdeM
Visiteur de recherche indépendant - UdeM
Postdoctorat - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Collaborateur·rice alumni - UdeM
Postdoctorat
Co-superviseur⋅e :
Visiteur de recherche indépendant - Technical University of Munich
Doctorat - UdeM
Co-superviseur⋅e :
Visiteur de recherche indépendant
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Postdoctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche
Collaborateur·rice de recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - McGill
Superviseur⋅e principal⋅e :

Publications

Graph-Based Active Machine Learning Method for Diverse and Novel Antimicrobial Peptides Generation and Selection
Bonaventure F. P. Dossou
Dianbo Liu
Xiaolu Ji
Moksh J. Jain
Almer M. van der Sloot
Roger Palou
Mike Tyers
Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization
The ability to accelerate the design of biological sequences can have a substantial impact on the progress of the medical field. The problem… (voir plus) can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds. Bayesian Optimization is a principled method for tackling this problem. However, the astronomically large state space of biological sequences renders brute-force iterating over all possible sequences infeasible. In this paper, we propose MetaRLBO where we train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection via Bayesian Optimization. We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data acquired in the previous rounds. Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results compared to existing strong baselines.
Unifying Generative Models with GFlowNets
There are many frameworks for deep generative modeling, each often presented with their own specific training algorithms and inference metho… (voir plus)ds. Here, we demonstrate the connections between existing deep generative models and the recently introduced GFlowNet framework, a probabilistic inference machine which treats sampling as a decision-making process. This analysis sheds light on their overlapping traits and provides a unifying viewpoint through the lens of learning with Markovian trajectories. Our framework provides a means for unifying training and inference algorithms, and provides a route to shine a unifying light over many generative models. Beyond this, we provide a practical and experimentally verified recipe for improving generative modeling with insights from the GFlowNet perspective.
Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning
Siba Moussa
Michael Kilgour
Clara Jans
Miroslava Cuperlovic‐culf
On the Generalization and Adaption Performance of Causal Models
Biological Sequence Design with GFlowNets
Building Robust Ensembles via Margin Boosting
Hongyang R. Zhang
Pradeep Ravikumar
Arun Sai Suggala
In the context of adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attac… (voir plus)ks, and as a result, has sub-optimal robustness. Consequently, an emerging line of work has focused on learning an ensemble of neural networks to defend against adversarial attacks. In this work, we take a principled approach towards building robust ensembles. We view this problem from the perspective of margin-boosting and develop an algorithm for learning an ensemble with maximum margin. Through extensive empirical evaluation on benchmark datasets, we show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion. An important byproduct of our work is a margin-maximizing cross-entropy (MCE) loss, which is a better alternative to the standard cross-entropy (CE) loss. Empirically, we show that replacing the CE loss in state-of-the-art adversarial training techniques with our MCE loss leads to significant performance improvement.
Generative Flow Networks for Discrete Probabilistic Modeling
We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data. Buil… (voir plus)ding upon the theory of generative flow networks (GFlowNets), we model the generation process by a stochastic data construction policy and thus amortize expensive MCMC exploration into a fixed number of actions sampled from a GFlowNet. We show how GFlowNets can approximately perform large-block Gibbs sampling to mix between modes. We propose a framework to jointly train a GFlowNet with an energy function, so that the GFlowNet learns to sample from the energy distribution, while the energy learns with an approximate MLE objective with negative samples from the GFlowNet. We demonstrate EB-GFN's effectiveness on various probabilistic modeling tasks. Code is publicly available at https://github.com/zdhNarsil/EB_GFN.
Multi-scale Feature Learning Dynamics: Insights for Double Descent
A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting fro… (voir plus)m the high-dimensional interactions between the large number of network parameters. Such non-trivial interactions lead to intriguing model behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. We study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions for the evolution of generalization error over training. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical experiments where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.
Towards Scaling Difference Target Propagation by Learning Backprop Targets
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (voir plus) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32
VIM: Variational Independent Modules for Video Prediction
We introduce a variational inference model called VIM, for Variational Independent Modules, for sequential data that learns and infers laten… (voir plus)t representations as a set of objects and discovers modular causal mechanisms over these objects. These mechanisms - which we call modules - are independently parametrized, define the stochastic transitions of entities and are shared across entities. At each time step, our model infers from a low-level input sequence a high-level sequence of categorical latent variables to select which transition modules to apply to which high-level object. We evaluate this model in video prediction tasks where the goal is to predict multi-modal future events given previous observations. We demonstrate empirically that VIM can model 2D visual sequences in an interpretable way and is able to identify the underlying dynamically instantiated mechanisms of the generation process. We additionally show that the learnt modules can be composed at test time to generalize to out-of-distribution observations.
YOUR AUTOREGRESSIVE GENERATIVE MODEL CAN BE BETTER IF YOU TREAT IT AS AN ENERGY-BASED ONE
Yezhen Wang
Tong Che
Bin Li
Kaitao Song
Hengzhi Pei
Dongsheng Li