Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle
Fondateur et Conseiller scientifique, Équipe de direction
Sujets de recherche
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Causalité
Modèles génératifs
Modèles probabilistes
Modélisation moléculaire
Neurosciences computationnelles
Raisonnement
Réseaux de neurones en graphes
Réseaux de neurones récurrents
Théorie de l'apprentissage automatique
Traitement du langage naturel

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Marie-Josée Beauchamp, adjointe administrative à marie-josee.beauchamp@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - UdeM
Collaborateur·rice de recherche - Cambridge University
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Visiteur de recherche indépendant - KAIST
Visiteur de recherche indépendant
Co-superviseur⋅e :
Doctorat - UdeM
Collaborateur·rice de recherche - N/A
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Collaborateur·rice de recherche - KAIST
Stagiaire de recherche - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Stagiaire de recherche - UdeM
Doctorat - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Collaborateur·rice alumni - UdeM
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Collaborateur·rice alumni - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni
Doctorat - UdeM
Collaborateur·rice alumni - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Ying Wu Coll of Computing
Doctorat - University of Waterloo
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems
Stagiaire de recherche - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Postdoctorat - UdeM
Visiteur de recherche indépendant - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Collaborateur·rice alumni - UdeM
Maîtrise recherche - UdeM
Visiteur de recherche indépendant - Technical University of Munich
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche
Stagiaire de recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :

Publications

Deep learning for AI
Yann LeCun
Geoffrey Hinton
How can neural networks learn the rich internal representations required for difficult tasks such as recognizing objects or understanding la… (voir plus)nguage?
Variational Causal Networks: Approximate Bayesian Inference over Causal Structures
Yashas Annadani
Jonas Rothfuss
Alexandre Lacoste
Nino Scherrer
Anirudh Goyal
Stefan Bauer
Learning the causal structure that underlies data is a crucial step towards robust real-world decision making. The majority of existing work… (voir plus) in causal inference focuses on determining a single directed acyclic graph (DAG) or a Markov equivalence class thereof. However, a crucial aspect to acting intelligently upon the knowledge about causal structure which has been inferred from finite data demands reasoning about its uncertainty. For instance, planning interventions to find out more about the causal mechanisms that govern our data requires quantifying epistemic uncertainty over DAGs. While Bayesian causal inference allows to do so, the posterior over DAGs becomes intractable even for a small number of variables. Aiming to overcome this issue, we propose a form of variational inference over the graphs of Structural Causal Models (SCMs). To this end, we introduce a parametric variational family modelled by an autoregressive distribution over the space of discrete DAGs. Its number of parameters does not grow exponentially with the number of variables and can be tractably learned by maximising an Evidence Lower Bound (ELBO). In our experiments, we demonstrate that the proposed variational posterior is able to provide a good approximation of the true posterior.
Comparative Study of Learning Outcomes for Online Learning Platforms
Francois St-Hilaire
Nathan J. Burns
Robert Belfer
Muhammad Shayan
Ariella Smofsky
Dung D. Vu
Antoine Frau
Joseph Potochny
Farid Faraji
Vincent Pavero
Neroli Ko
Ansona Onyi Ching
Sabina Elkins
A. Stepanyan
Adela Matajova
Iulian V. Serban
Ekaterina Kochmar
SpeechBrain: A General-Purpose Speech Toolkit
Titouan Parcollet
Peter William VanHarn Plantinga
Aku Rouhe
Samuele Cornell
Loren Lugosch
Nauman Dawalatabad
Abdelwahab HEBA
Jianyuan Zhong
Ju-Chieh Chou
Sung-Lin Yeh
Szu-Wei Fu
Chien-Feng Liao
Elena Rastorgueva
Franccois Grondin
William Aris
Hwidong Na
Yan Gao
Renato De Mori … (voir 1 de plus)
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech proc… (voir plus)essing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of speech benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular speech datasets, as well as tutorials which allow anyone with basic Python proficiency to familiarize themselves with speech technologies.
CMIM: Cross-Modal Information Maximization For Medical Imaging
Tristan Sylvain
Francis Dutil
Tess Berthier
Lisa Di Jorio
Margaux Luck
In hospitals, data are siloed to specific information systems that make the same information available under different modalities such as th… (voir plus)e different medical imaging exams the patient undergoes (CT scans, MRI, PET, Ultrasound, etc.) and their associated radiology reports. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.In this paper, we propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time, using recent advances in mutual information maximization. By maximizing cross-modal information at train time, we are able to outperform several state-of-the-art baselines in two different settings, medical image classification, and segmentation. In particular, our method is shown to have a strong impact on the inference-time performance of weaker modalities.
Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models
Tong Che
Xiaofeng Liu
Site Li
Yubin Ge
Ruixiang ZHANG
Caiming Xiong
AI Safety is a major concern in many deep learning applications such as autonomous driving. Given a trained deep learning model, an importan… (voir plus)t natural problem is how to reliably verify the model's prediction. In this paper, we propose a novel framework --- deep verifier networks (DVN) to detect unreliable inputs or predictions of deep discriminative models, using separately trained deep generative models. Our proposed model is based on conditional variational auto-encoders with disentanglement constraints to separate the label information from the latent representation. We give both intuitive and theoretical justifications for the model. Our verifier network is trained independently with the prediction model, which eliminates the need of retraining the verifier network for a new model. We test the verifier network on both out-of-distribution detection and adversarial example detection problems, as well as anomaly detection problems in structured prediction tasks such as image caption generation. We achieve state-of-the-art results in all of these problems.
Meta-learning framework with applications to zero-shot time-series forecasting
Boris Oreshkin
Dmitri Carpov
Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new… (voir plus) TS coming from different datasets? This work provides positive evidence to this using a broad meta-learning framework which we show subsumes many existing meta-learning algorithms. Our theoretical analysis suggests that residual connections act as a meta-learning adaptation mechanism, generating a subset of task-specific parameters based on a given TS input, thus gradually expanding the expressive power of the architecture on-the-fly. The same mechanism is shown via linearization analysis to have the interpretation of a sequential update of the final linear layer. Our empirical results on a wide range of data emphasize the importance of the identified meta-learning mechanisms for successful zero-shot univariate forecasting, suggesting that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining, resulting in performance that is at least as good as that of state-of-practice univariate forecasting models.
Object-Centric Image Generation from Layouts
Tristan Sylvain
Pengchuan Zhang
Shikhar Sharma
Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies
Giulia Zarpellon
Jason Jo
Andrea Lodi
Branch and Bound (B&B) is the exact tree search method typically used to solve Mixed-Integer Linear Programming problems (MILPs). Learning b… (voir plus)ranching policies for MILP has become an active research area, with most works proposing to imitate the strong branching rule and specialize it to distinct classes of problems. We aim instead at learning a policy that generalizes across heterogeneous MILPs: our main hypothesis is that parameterizing the state of the B&B search tree can aid this type of generalization. We propose a novel imitation learning framework, and introduce new input features and architectures to represent branching. Experiments on MILP benchmark instances clearly show the advantages of incorporating an explicit parameterization of the state of the search tree to modulate the branching decisions, in terms of both higher accuracy and smaller B&B trees. The resulting policies significantly outperform the current state-of-the-art method for "learning to branch" by effectively allowing generalization to generic unseen instances.
Visual Concept Reasoning Networks
Taesup Kim
Sungwoong Kim
How does hemispheric specialization contribute to human-defining cognition?
Gesa Hartwigsen
Toward Causal Representation Learning
Bernhard Schölkopf
Francesco Locatello
Stefan Bauer
Nan Rosemary Ke
Nal Kalchbrenner
Anirudh Goyal
The two fields of machine learning and graphical causality arose and are developed separately. However, there is, now, cross-pollination and… (voir plus) increasing interest in both fields to benefit from the advances of the other. In this article, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.