Yoshua Bengio

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Marie-Josée Beauchamp, adjointe administrative à marie-josee.beauchamp@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Collaborateur·rice alumni - McGill

Mohammed Abukalam

Collaborateur·rice alumni - UdeM

Berkes Anaïs

Collaborateur·rice de recherche - Cambridge University

Superviseur⋅e principal⋅e :

Rim Assouel

Doctorat - UdeM

Stefan Bauer

Visiteur de recherche indépendant

Co-superviseur⋅e :

Guillaume Lajoie

Paul Bertin

Doctorat - UdeM

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche - KAIST

Doctorat - UdeM

Doctorat - UdeM

Stagiaire de recherche - UdeM

Co-superviseur⋅e :

Loubna Benabbou

Eric Elmoznino

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Leo Feng

Doctorat - UdeM

leo.feng@mila.quebec

Ivan Grega

Stagiaire de recherche - UdeM

Doctorat

Doctorat - UdeM

mohsin.hasan@mila.quebec

Edward Hu

Doctorat - UdeM

Moksh Jain

Doctorat - UdeM

moksh.jain@mila.quebec

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - UdeM

Minsu Kim

Stagiaire de recherche - UdeM

Hyeonah Kim

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Alex Hernandez

Yaroslav KIVVA

Collaborateur·rice de recherche - UdeM

Salem Lahlou

Collaborateur·rice alumni - UdeM

Seanie Lee

Collaborateur·rice alumni - UdeM

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni

Zhen Liu

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Liam Paull

Kanika Madan

Doctorat - UdeM

Nikolay Malkin

Collaborateur·rice alumni - UdeM

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

Sören Mindermann

Collaborateur·rice de recherche - UdeM

Sarthak Mittal

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant - UdeM

Padideh Nouri

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Ali Parviz

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Camille Rochefort-Boulanger

Lena Podina

Doctorat - University of Waterloo

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems

Doctorat - UdeM

Postdoctorat - UdeM

Visiteur de recherche indépendant - UdeM

Postdoctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Julie Hussin

Victor Schmidt

Collaborateur·rice alumni - UdeM

Postdoctorat - UdeM

Maîtrise recherche - UdeM

Marcin Sendera

Collaborateur·rice alumni - UdeM

Vedant Shah

Maîtrise recherche - UdeM

Postdoctorat

Marco Stock

Visiteur de recherche indépendant - Technical University of Munich

marco.stock@tum.de

Mélisande Astrid Crystal Teng

Doctorat - UdeM

Co-superviseur⋅e :

Hugo Larochelle

alexander.tong@mila.quebec

Alex Tong

Postdoctorat - UdeM

Postdoctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Omar G. Younis

Collaborateur·rice de recherche

Collaborateur·rice de recherche - KAIST

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Nicole Zhang

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Harry Zhao

Doctorat - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

Scaling in the service of reasoning & model-based ML

4 avril 2023

par

Yoshua Bengio

Edward J. Hu

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

par

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

Les réseaux de flot génératifs

15 mars 2022

par

Yoshua Bengio

Publications

Contrastive introspection (ConSpec) to rapidly identify invariant prototypes for success in RL

Chen Sun

Mila

Wannan Yang

Benjamin Alsbury-Nealy

Thomas Jiralerspong

†. BlakeRichards

Reinforcement learning (RL) algorithms have achieved notable success in recent years, but still struggle with fundamental issues in long-ter… (voir plus)m credit assignment. It remains diﬃcult to learn in situations where success is contingent upon multiple critical steps that are distant in time from each other and from a sparse reward; as is often the case in real life. Moreover, how RL algorithms assign credit in these diﬃcult situations is typically not coded in a way that can rapidly generalize to new situations. Here, we present an approach using oﬄine contrastive learning, which we call contrastive introspection (ConSpec), that can be added to any existing RL algorithm and addresses both issues. In ConSpec, a contrastive loss is used during oﬄine replay to identify invariances among successful episodes. This takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon than it is to prospectively predict reward at every step taken in the environment. ConSpec stores this knowledge in a collection of prototypes summarizing the intermediate states required for success. During training, arrival at any state that matches these prototypes generates an intrinsic reward that is added to any external rewards. As well, the reward shaping provided by ConSpec can be made to preserve the optimal policy of the underlying RL agent. The prototypes in ConSpec provide two key beneﬁts for credit assignment: (1) They enable rapid identiﬁcation of all the critical states. (2) They do so in a readily interpretable manner, enabling out of distribution generalization when sensory features are altered. In summary, ConSpec is a modular system that can be added to any existing RL algorithm to improve its long-term credit assignment.

Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning

Riashat Islam

Hongyu Zang

Anirudh Goyal

Alex Lamb

Kenji Kawaguchi

Xin Li

Romain Laroche

Remi Tachet des Combes

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and rea… (voir plus)ch a diverse set of objectives. How to \textit{specify} and \textit{ground} these goals in such a way that we can both reliably reach goals during training as well as generalize to new goals during evaluation remains an open area of research. Defining goals in the space of noisy, high-dimensional sensory inputs is one possibility, yet this poses a challenge for training goal-conditioned agents, or even for generalization to novel goals. We propose to address this by learning compositional representations of goals and processing the resulting representation via a discretization bottleneck, for coarser specification of goals, through an approach we call DGRL. We show that discretizing outputs from goal encoders through a bottleneck can work well in goal-conditioned RL setups, by experimentally evaluating this method on tasks ranging from maze environments to complex robotic navigation and manipulation tasks. Additionally, we show a theoretical result which bounds the expected return for goals not observed during training, while still allowing for specifying goals with expressive combinatorial structure.

openreview.net

Discrete-Valued Neural Communication in Structured Architectures Enhances Generalization

Dianbo Liu

Alex Lamb

Kenji Kawaguchi

Anirudh Goyal

Chen Sun

Michael Curtis Mozer

In this appendix, as a complementary to Theorems 1–2, we provide additional theorems, Theorems 3–4, which further illustrate the two adv… (voir plus)antages of the discretization process by considering an abstract model with the discretization bottleneck. For the advantage on the sensitivity, the error due to potential noise and perturbation without discretization — the third term ξ(w, r′,M′, d) > 0 in Theorem 4 — is shown to be minimized to zero with discretization in Theorems 3. For the second advantage, the underlying dimensionality of N(M′,d′)(r,H) + ln(N(M,d)(r,Θ)/δ) without discretization (in the bound of Theorem 4) is proven to be reduced to the typically much smaller underlying dimensionality of L + ln(N(M,d)(r, E ×Θ) with discretization in Theorems 3. Here, for any metric space (M, d) and subset M ⊆ M, the r-converging number of M is defined by N(M,d)(r,M) = min { |C| : C ⊆ M,M ⊆ ∪c∈CB(M,d)[c, r]} where the (closed) ball of radius r at centered at c is denoted by B(M,d)[c, r] = {x ∈M : d(x, c) ≤ r}. See Appendix C.1 for a simple comparison between the bound of Theorem 3 and that of Theorem 4 when the metric spaces (M, d) and (M′, d′) are chosen to be Euclidean spaces.

Enhanced Biomedical Knowledge Discovery From Unstructured Text Using Contextual Embeddings

Iz Beltagy

Kyle Lo

Arman Cohan. 2019

Scib-500

R´ejean Ducharme

Pascal Vincent

Rishi Bommasani

Kelly Davis

Claire Cardie

Billy Chiu

Sampo Pyysalo

Ivan Vuli´c

Extracting knowledge from large, unstruc-001 tured text corpora presents a challenge. Re-002 cently, authors have utilized unsupervised, 003… (voir plus) static word embeddings to uncover "latent 004 knowledge" contained within domain-speciﬁc 005 scientiﬁc corpora. Here semantic-similarity 006 measures between representations of concepts, 007 objects or entities were used to predict re-008 lationships, which were later veriﬁed using 009 physical methods. Static language models 010 have recently been surpassed at most down-011 stream tasks by massively pre-trained, contex-012 tual language models like BERT. Some have 013 postulated that contextualized embeddings po-014 tentially yield word representations superior 015 to static ones for knowledge-discovery pur-016 poses. In an effort to address this ques-017 tion, two biomedically-trained BERT models 018 (BioBERT, SciBERT) were used to encode 019 n = 500, 1000 or 5000 sentences containing 020 words of interest extracted from a biomedical 021 corpus (Coronavirus Open Research Dataset). 022 The n representations for the words of inter-023 est were subsequently extracted and then ag-024 gregated to yield static-equivalent word rep-025 resentations. These words belonged to the 026 vocabularies of intrinsic benchmarking tools 027 for the biomedical domain (Bio-SimVerb and 028 Bio-SimLex), which assess quality of word 029 representations using semantic-similarity and 030 relatedness measures. Using intrinsic bench-031 marking tasks, feasibility of using contextual-032 ized word representations for knowledge dis-033 covery tasks can be assessed: Word represen-034 tations that better encode described reality are 035 expected to perform better (i.e. closer to do-036 main experts). As postulated, BERT embed-037 dings outperform static counterparts

Extended Abstract Track

Amin Mansouri

Jason Hartford

Kartik Ahuja

Christian Shewmake

Simone Azeglio

Arianna Di Bernardo

Nina Miolane

There has been significant recent progress in causal representation learning that has showed a variety of settings in which we can disentang… (voir plus)le latent variables with identifiability guarantees (up to some reasonable equivalence class). Common to all of these approaches is the assumption that (1) the latent variables are d − dimensional vectors, and (2) that the observations are the output of some injective observation function of these latent variables. While these assumptions appear benign—they amount to assuming that any changes in the latent space are reflected in the observation space, and that we can use standard encoders to infer the latent variables—we show that when the observations are of multiple objects, the observation function is no longer injective, and disentanglement fails in practice. We can address this failure by combining recent developments in object-centric learning and causal representation learning. By modifying the Slot Attention architecture (Locatello et al., 2020b), we develop an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object’s properties. We argue that this approach is more data-efficient in the sense that it requires significantly fewer perturbations than a comparable approach that encodes to a Euclidean space and, we show that this approach successfully disentangles the properties of a set of objects in a series of simple image-based disentanglement experiments.

2022-01-01

(publié)

www.semanticscholar.org

Extended Abstract Track

Amin Mansouri

Jason Hartford

Kartik Ahuja

Christian Shewmake

Simone Azeglio

Arianna Di Bernardo

Nina Miolane

S5 Framework: A Review of Self-Supervised Shared Semantic Space Optimization for Multimodal Zero-Shot Learning

Clst

Yonatan Bisk

Ari Holtzman

Jesse Thomason

Ja-740 cob

Joyce Chai

Angeliki Lapata

Jonathan Lazaridou

Alek-742 May

Nicolas sandr Nisnevich

P. PintoJoseph

Turian

Ting Chen

Simon Kornblith

Mohammad Norouzi

Yen-Chun Chen

Linjie Li

Licheng Yu

Ahmed El … (voir 89 de plus)

Faisal Kholy

Zhe Ahmed

Yu Gan

Cheng

Zihan Dai

Hanxiao Liu

Quoc V. Le

Jia Deng

Wei Dong

Richard Socher

Li-Jia Li

K. Liu

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Jesse Dodge

Maarten Sap

Ana Marasovic

Gabriel Agnew

Dirk Ilharco

Groeneveld Matt

Li Dong

Nan Yang

Wenhui Wang

Furu Wei

Yu Liu

Jianfeng Wang

Ming Gao

Zhou

Xiaoyi Dong

Jia Bao

Tinglu Zhang

Dongdong

Weiming Chen

Lu Zhang

Dong Yuan

Fang Chen

Da-cheng Juan

Chuntian Lu

Zhen Li

Futang Peng

Aleksei Timofeev

Yi-Ting Chen

Yaxi Gao

Tom

Andrew Duerig

Tomkins Sujith

Ravi

Lukasz Kaiser

Aidan N. Gomez

Noam M. Shazeer

Niki Vaswani

Llion Parmar

Jones Jakob

Uszko-850

Alex G. Kendall

Yarin Gal

Roberto Cipolla

Salman H. Khan

Muzammal Naseer

Munawar Hayat

Waqas Zamir

Fahad Shahbaz

Khan

Ranjay Krishna

Yuke Zhu

Oliver Groth

Justin John-867

Kenji Hata

Joshua Kravitz

Stephanie Chen

Mike Lewis

Yinhan Liu

Marjan Naman Goyal

Abdelrahman Ghazvininejad

Omer Mohamed

Levy

Luke Zettlemoyer

Bohan Li

Hao Zhou

Jun-Tao He

Mingxuan Wang

Liunian Harold

Mark Li

Da Yatskar

Yin

Cho-Jui

Kai-Wei Chang

Visualbert

In this review, we aim to inspire research into 001 S elf-S upervised S hared S emantic S pace ( S5 ) 002 multimodal learning problems. We e… (voir plus)quip non-003 expert researchers with a framework of in-004 formed modeling decisions via an extensive 005 literature review, an actionable modeling check-006 list, as well as a series of novel zero-shot eval-007 uation tasks. The core idea for our S5 check-008 list lies in learning contextual multimodal in-009 teractions at various granularity levels via a 010 shared Transformer encoder with a denoising 011 loss term, which is also regularized by a con-012 trastive loss term to induce a semantic align-013 ment prior on the contextual embedding space. 014 Essentially, we aim to model human concept 015 understanding and thus learn to “put a name to 016 a face”. This ultimately enables interpretable 017 zero-shot S5 generalization on a variety of 018 novel downstream tasks. In summary, this re-019 view provides sufficient background and ac-020 tionable strategies for training cutting-edge S5 021 multimodal networks. 022

Harvesting Mature Relation Extraction Models from Limited Seed Knowledge: A Self-Development Framework for DS Rule Expansion

Raphael Hoffmann

Congle Zhang

Xiao Ling

Yankai Lin

Shiqi Shen

Zhiyuan Liu

Huanbo Luan

Christopher D Manning

M. Surdeanu

John Bauer

Adriana Romero Soriano

Pietro Lio’

Xuanhui Wang

Cheng Li

Nadav Golbandi

Bendersky Marc

Najork. 2018

The

Wentao Wu … (voir 2 de plus)

Hongsong Li

Haixun Wang

Distantly-supervised relation extraction 001 (DSRE) is an effective method to scale relation 002 extraction (RE) to large unlabeled corpora … (voir plus)003 with the utilization of knowledge bases (KBs), 004 but suffers from the scale of KBs and the 005 introduced noise. 006 To alleviate the above two problems, we 007 propose a novel framework called S elf-008 devel O pment r U le ex P ansion ( SOUP ), which 009 starts from limited amount of labeled data 010 and continuously produces low-noise labels on 011 large-scaled unlabeled data by a growing learn-012 able logical rules set. 013 Specifically, SOUP achieves a mutual enhance-014 ment of RE model and logical rules set, first 015 a RE model is trained on the labeled data to 016 summarize the knowledge, then the knowledge 017 is utilized to explore candidate rules from unla-018 beled data, finally high-quality candidates are 019 selected in a graph-based ranking manner to ex-020 tend the logical rules set and new rule-labeled 021 data are provided for better RE model training. 022 Experiments on wiki20 dataset demonstrate 023 that, with limited seed knowledge from small-024 scaled manually labeled data, SOUP achieves 025 significant improvement compared to baselines 026 by producing continuous growth of both logical 027 rules and the RE model, and that labeling noise 028 of SOUP is much less than DS. Furthermore, 029 RE model enhanced by SOUP with 1.6k logical 030 rules learned from prior knowledge could pro-031 duce an equivalent performance to the model 032 trained on data labeled in DS manner by 72k 033 relational facts of KBs. 034

Is a Modular Architecture Enough?

Sarthak Mittal

Guillaume Lajoie

Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent… (voir plus) work demonstrates that not only do some modular architectures generalize well, but they also lead to better out of distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparse modular connections, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.

openreview.net

Neural Attentive Circuits

Nasim Rahaman

Martin Weiss

Francesco Locatello

Chris Pal

Bernhard Schölkopf

Li Erran Li

Nicolas Ballas

Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modali… (voir plus)ties. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.

openreview.net

Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

Eric Larsen

Sébastien Lachapelle

Emma Frejinger

Simon Lacoste-Julien

Andrea Lodi

This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a method… (voir plus)ology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming, where the second stage is demanding computationally. We aim to predict at a high speed the expected TDOS associated with the second-stage problem, conditionally on the first-stage variables. This may be used in support of the solution to the overall two-stage problem by avoiding the online generation of multiple second-stage scenarios and solutions. We formulate the tactical prediction problem as a stochastic optimal prediction program, whose solution we approximate with supervised machine learning. The training data set consists of a large number of deterministic operational problems generated by controlled probabilistic sampling. The labels are computed based on solutions to these problems (solved independently and offline), employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application on load planning for rail transportation show that deep learning models produce accurate predictions in very short computing time (milliseconds or less). The predictive accuracy is close to the lower bounds calculated based on sample average approximation of the stochastic prediction programs.

2022-01-01

INFORMS Journal on Computing (publié)

doi.org

arxiv.org

TaHiD: Tackling Data Hiding in Fake News Detection with News Propagation Networks

Adrien Benamira

Benjamin Devillers

Etienne Lesot

Ayush K. Ray

Manal Saadi

Fragkiskos D 587

Steven Bird

Ewan Klein

Edward Loper

Nat-593

Carlos Castillo

Marcelo Mendoza

Barbara Poblete

Daryna Dementieva

Alexander Panchenko

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Ashish Vaswani

Noam M. Shazeer … (voir 8 de plus)

Niki Parmar

Adriana Romero Soriano

Pietro Lio’