Yoshua Bengio

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Cassidy MacNeil, adjointe principale et responsable des opérations cassidy.macneil@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Collaborateur·rice alumni - McGill

Berkes Anaïs

Collaborateur·rice de recherche - Cambridge University

Superviseur⋅e principal⋅e :

Rim Assouel

Doctorat - UdeM

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche - KAIST

Doctorat - UdeM

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

Doctorat

Doctorat - UdeM

Moksh Jain

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - UdeM

Hyeonah Kim

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Alex Hernández-García

Minsu Kim

Collaborateur·rice de recherche - UdeM

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni

Song LIU

Collaborateur·rice de recherche - s.o.

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

Sarthak Mittal

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant - UdeM

Padideh Nouri

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Ali Parviz

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Camille Rochefort-Boulanger

Lena Podina

Collaborateur·rice de recherche - University of Waterloo

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Postdoctorat - UdeM

Postdoctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Julie Hussin

Divya Sharma

Postdoctorat

Co-superviseur⋅e :

Alex Hernández-García

Mélisande Astrid Crystal Teng

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Hugo Larochelle

Ivan Titov

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Siva Reddy

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Alex Tong

Collaborateur·rice alumni - UdeM

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Collaborateur·rice de recherche

Collaborateur·rice de recherche - UdeM

Doctorat - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Harry Zhao

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

Scaling in the service of reasoning & model-based ML

4 avril 2023

par

Yoshua Bengio

Edward J. Hu

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

par

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

Les réseaux de flot génératifs

15 mars 2022

par

Yoshua Bengio

Publications

GFlowOut: Dropout with Generative Flow Networks

Dianbo Liu

Moksh Jain

Bonaventure F. P. Dossou

Qianli Shen

Salem Lahlou

Anirudh Goyal

Nikolay Malkin

Chris C. Emezue

Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and general… (voir plus)ization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate uncertainty with deep neural networks. Traditionally, the dropout mask is sampled independently from a fixed distribution. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. GFlowOut leverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks. We empirically demonstrate that GFlowOut results in predictive distributions that generalize better to out-of-distribution data, and provide uncertainty estimates which lead to better performance in downstream tasks.

2022-12-31

ICML (publié)

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

Eric Nguyen

Michael Poli

Marjan Faizi

Armin W Thomas

Callum Birch-Sykes

Michael Wornow

Aman Patel

Clayton M. Rabideau

Stefano Massaroli

Stefano Ermon

Stephen Baccus

Christopher Re

2022-12-31

Advances in Neural Information Processing Systems 36 (NeurIPS 2023) (publié)

Learning GFlowNets From Partial Episodes For Improved Convergence And Stability

Andrei Nica

Tom Bosc

Nikolay Malkin

Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized … (voir plus)target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD(

2022-12-31

ICML (publié)

MixupE: Understanding and improving Mixup from directional derivative perspective

Vikas Verma

Yingtian Zou

Sarthak Mittal

Wai Hoh Tang

Hieu Pham

Juho Kannala

Arno Solin

Kenji Kawaguchi

2022-12-31

UAI (publié)

NEURAL NETWORK-BASED SOLVERS FOR PDES

M. Cameron

Ian G Goodfellow

(1) N (x; θ) = Ll+1 ○ σl ○Ll ○ σl−1 ○ . . . ○ σ1 ○L1. The symbol Lk denotes the k’s affine operator of the form Lk(x) = … (voir plus)Akx + bk, while σk denotes a nonlinear function called an activation function. The activation functions are chosen by the user. The matrices Ak and shift vectors (or bias vectors) bk are encoded into the argument θ: θ = {Ak, bk} l+1 k=1. The term training neural network means finding {Ak, bk} l+1 k=1 such that N (x; θ) satisfies certain conditions. These conditions are described by the loss function chosen by the user. For example, one might want the neural network to assume certain values fj at certain points xj , j = 1, . . . ,N . These points x are called the training data. In this case, a common choice of the loss function is the least squares error:

2022-12-31

(publié)

www.semanticscholar.org

Stochastic Generative Flow Networks

Longbo Huang

2022-12-31

Conference on Uncertainty in Artificial Intelligence (publié)

Supplementary Material for MixupE

Yingtian Zou

Vikas Verma

Sarthak Mittal

Wai Hoh Tang

Hieu Pham

Juho Kannala

Arno Solin

Kenji Kawaguchi

We denote by z = (x,y) the input and output pair where x ∈ X ⊆ R and y ∈ Y ⊆ R . Let fθ(x) ∈ R be the output of the logits (i.e.,… (voir plus) the last layer before the softmax or sigmoid) of the model parameterized by θ. We use l(θ, z) = h(fθ(x)) − yfθ(x) to denote the loss function. Let g(·) be the activation function. We use x(i) to index i-th element of the vector x and xj to represent j-th variable in a set. The notation list is:

2022-12-31

(publié)

www.semanticscholar.org

A Theory of Continuous Generative Flow Networks

Alex Hernández-García

Lena Nehale Ezzine

Nikolay Malkin

Generative flow networks (GFlowNets) are amortized variational inference algorithms that are trained to sample from unnormalized target dist… (voir plus)ributions over compositional objects. A key limitation of GFlowNets until this time has been that they are restricted to discrete spaces. We present a theory for generalized GFlowNets, which encompasses both existing discrete GFlowNets and ones with continuous or hybrid state spaces, and perform experiments with two goals in mind. First, we illustrate critical points of the theory and the importance of various assumptions. Second, we empirically demonstrate how observations about discrete GFlowNets transfer to the continuous case and show strong results compared to non-GFlowNet baselines on several previously studied tasks. This work greatly widens the perspectives for the application of GFlowNets in probabilistic inference and various modeling settings.

2022-12-31

ICML (publié)

Tree Cross Attention

Leo Feng

Frederick Tung

Hossein Hajimirsadeghi

Mohamed Osama Ahmed

Cross Attention is a popular method for retrieving information from a set of context tokens for making predictions. At inference time, for e… (voir plus)ach prediction, Cross Attention scans the full set of

2022-12-31

arXiv (prépublication)

Rethinking Learning Dynamics in RL using Adversarial Networks

Ramnath Kumar

Tristan Deleu

Recent years have seen tremendous progress in methods of reinforcement learning. However, most of these approaches have been trained in a st… (voir plus)raightforward fashion and are generally not robust to adversity, especially in the meta-RL setting. To the best of our knowledge, our work is the first to propose an adversarial training regime for Multi-Task Reinforcement Learning, which requires no manual intervention or domain knowledge of the environments. Our experiments on multiple environments in the Multi-Task Reinforcement learning domain demonstrate that the adversarial process leads to a better exploration of numerous solutions and a deeper understanding of the environment. We also adapt existing measures of causal attribution to draw insights from the skills learned, facilitating easier re-purposing of skills for adaptation to unseen environments and tasks.

2022-12-08

NeurIPS.cc/2022/Workshop/DeepRL (accepté)

Bayesian Dynamic Causal Discovery

Alexander Tong

Lazar Atanackovic

Jason Hartford

Learning the causal structure of observable variables is a central focus for scientific discovery. Bayesian causal discovery methods tackle … (voir plus)this problem by learning a posterior over the set of admissible graphs that are equally likely given our priors and observations. Existing methods primarily consider observations from static systems and assume the underlying causal structure takes the form of a directed acyclic graph (DAG). In settings with dynamic feedback mechanisms that regulate the trajectories of individual variables, this acyclicity assumption fails unless we account for time. We treat causal discovery in the unrolled causal graph as a problem of sparse identification of a dynamical system. This imposes a natural temporal causal order between variables and captures cyclic feedback loops through time. Under this lens, we propose a new framework for Bayesian causal discovery for dynamical systems and present a novel generative flow network architecture (Dyn-GFN) tailored for this task. Dyn-GFN imposes an edge-wise sparse prior to sequentially build a k -sparse causal graph. Through evaluation on temporal data, our results show that the posterior learned with Dyn-GFN yields improved Bayes coverage of admissible causal structures relative to state of the art Bayesian causal discovery methods.

2022-11-29

NeurIPS.cc/2022/Workshop/CDS (poster)

Object-centric causal representation learning

Amin Mansouri

Jason Hartford

Kartik Ahuja

2022-11-06

NeurIPS.cc/2022/Workshop/NeurReps (poster)