Yoshua Bengio

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Cassidy MacNeil, adjointe principale et responsable des opérations cassidy.macneil@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Collaborateur·rice alumni - McGill

Mohammed Abukalam

Collaborateur·rice alumni - UdeM

Berkes Anaïs

Collaborateur·rice de recherche - Cambridge University

Superviseur⋅e principal⋅e :

Rim Assouel

Doctorat - UdeM

Stefan Bauer

Visiteur de recherche indépendant

Co-superviseur⋅e :

Guillaume Lajoie

Paul Bertin

Doctorat - UdeM

Joyce Chai

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Siva Reddy

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche - KAIST

Collaborateur·rice alumni - UdeM

Doctorat - UdeM

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Loubna Benabbou

Desmond Elliott

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

Leo Feng

Doctorat - UdeM

Doctorat

Doctorat - UdeM

Edward Hu

Doctorat - UdeM

Moksh Jain

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - UdeM

Hyeonah Kim

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Alex Hernandez-Garcia

Salem Lahlou

Collaborateur·rice alumni - UdeM

Tabitha Edith Lee

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni

Zhen Liu

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Liam Paull

Kanika Madan

Doctorat - UdeM

Nikolay Malkin

Collaborateur·rice alumni - UdeM

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

Sarthak Mittal

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant - UdeM

Padideh Nouri

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Ali Parviz

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Lena Podina

Collaborateur·rice de recherche - University of Waterloo

Superviseur⋅e principal⋅e :

David Rolnick

Nassim Rahaman

Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Collaborateur·rice de recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Postdoctorat - UdeM

Visiteur de recherche indépendant - UdeM

Oli RICHARDSON

Postdoctorat - UdeM

Camille Rochefort-Boulanger

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Julie Hussin

Abhik Roychoudhury Roychoudhury

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Collaborateur·rice alumni - UdeM

Marcin Sendera

Collaborateur·rice alumni - UdeM

Divya Sharma

Postdoctorat

Co-superviseur⋅e :

Alex Hernandez-Garcia

Mélisande Astrid Crystal Teng

Doctorat - UdeM

Co-superviseur⋅e :

Hugo Larochelle

Ivan Titov

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Siva Reddy

Alex Tong

Collaborateur·rice alumni - UdeM

Postdoctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche

Collaborateur·rice de recherche - UdeM

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Nicole Zhang

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Harry Zhao

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

Scaling in the service of reasoning & model-based ML

4 avril 2023

par

Yoshua Bengio

Edward J. Hu

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

par

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

Les réseaux de flot génératifs

15 mars 2022

par

Yoshua Bengio

Publications

Saliency is a Possible Red Herring When Diagnosing Poor Generalization

Joseph D Viviano

Becks Simpson

Francis Dutil

Joseph Paul Cohen

Poor generalization is one symptom of models that learn to predict target variables using spuriously-correlated image features present only … (voir plus)in the training distribution instead of the true image features that denote a class. It is often thought that this can be diagnosed visually using attribution (aka saliency) maps. We study if this assumption is correct. In some prediction tasks, such as for medical images, one may have some images with masks drawn by a human expert, indicating a region of the image containing relevant information to make the prediction. We study multiple methods that take advantage of such auxiliary labels, by training networks to ignore distracting features which may be found outside of the region of interest. This mask information is only used during training and has an impact on generalization accuracy depending on the severity of the shift between the training and test distributions. Surprisingly, while these methods improve generalization performance in the presence of a covariate shift, there is no strong correspondence between the correction of attribution towards the features a human expert have labelled as important and generalization performance. These results suggest that the root cause of poor generalization may not always be spatially defined, and raise questions about the utility of masks as 'attribution priors' as well as saliency maps for explainable predictions.

2021-01-01

ICLR (publié)

openreview.net

Seeing things or seeing scenes: Investigating the capabilities of V&L models to align scene descriptions to images

Matt D Anderson

Erich W Graf

James H Elder

Peter Anderson

Xiaodong He

Chris Buehler

Mark Teney

Stephen Johnson

Gould Lei

Emily M. Bender

Timnit Gebru

Angelina McMillan-575

Alexander Koller. 2020

Climb-582

Yonatan Bisk

Ari Holtzman

Jesse Thomason

Joyce Chai

Angeliki Lazaridou … (voir 32 de plus)

Jonathan May

Aleksandr

Thomas Unterthiner

Mostafa Dehghani

Georg Minderer

Sylvain Heigold

Jakob Gelly

Uszkoreit Neil

Houlsby. 2020

Lisa Anne Hendricks

Gabriel Ilharco

Rowan Zellers

Ali Farhadi

John M. Henderson

Contextual

Thomas L. Grifﬁths. 2021

Are Convolutional

Neu-827

Melissa L.-H. Võ

Jeremy M. Wolfe

Differen-830

Jianfeng Wang

Xiaowei Hu

Xiu-834 Pengchuan Zhang

Roy Schwartz

Bolei Zhou

Àgata Lapedriza

Jianxiong Xiao

Hang Zhao

Xavier Puig

Sanja Fidler

Images can be described in terms of the objects 001 they contain, or in terms of the types of scene 002 or place that they instantiate. In t… (voir plus)his paper we 003 address to what extent pretrained Vision and 004 Language models can learn to align descrip-005 tions of both types with images. We com-006 pare 3 state-of-the-art models, VisualBERT, 007 LXMERT and CLIP. We ﬁnd that (i) V&L 008 models are susceptible to stylistic biases ac-009 quired during pretraining; (ii) only CLIP per-010 forms consistently well on both object-and 011 scene-level descriptions. A follow-up ablation 012 study shows that CLIP uses object-level infor-013 mation in the visual modality to align with 014 scene-level textual descriptions

A Simple and Effective Model for Multi-Hop Question Generation

Jimmy Lei Ba

Jamie Ryan Kiros

Geoffrey E Hin-602

Peter W. Battaglia

Jessica Blake

Chandler Hamrick

Vic-613 tor Bapst

Alvaro Sanchez

Vinicius Zambaldi

M. Malinowski

Andrea Tacchetti

David Raposo

Tom B. Brown

Benjamin Mann

Nick Ryder

Melanie Subbiah

Jared Kaplan

Prafulla Dhariwal

Arvind Neelakantan

Pranav Shyam … (voir 72 de plus)

Girish Sastry

William L. Hamilton

Clutrr

Nitish Srivastava

Geoffrey Hinton

Alex Krizhevsky

Ilya Sutskever

Ruslan Salakhutdinov. 2014

Gabriel Stanovsky

Julian Michael

Luke Zettlemoyer

Dan Su

Yan Xu

Wenliang Dai

Ziwei Ji

Tiezheng Yu

Minghao Tu

Kevin Huang

Guangtao Wang

Jing Huang

Ashish Vaswani

Noam M. Shazeer

Niki Parmar

Jakob Uszkoreit

Llion Jones

Aidan N. Gomez

Łukasz Kaiser

Illia Polosukhin. 2017

Attention

Petar Veliˇckovi´c

Guillem Cucurull

Arantxa Casanova

Adriana Romero Soriano

Pietro Lio’

Johannes Welbl

Pontus Stenetorp

Yonghui Wu

Mike Schuster

Quoc Zhifeng Chen

Mohammad Le

Wolfgang Norouzi

Macherey

M. Krikun

Yuan Cao

Qin Gao

William W. Cohen

Jianxing Yu

Xiaojun Quan

Qinliang Su

Jian Yin

Yuyu Zhang

Hanjun Dai

Zornitsa Kozareva

Cheng Zhao

Chenyan Xiong

Corby Rosset

Xia

Paul Song

Bennett Saurabh

Tiwary

Yao Zhao

Xiaochuan Ni

Yuanyuan Ding

Qingyu Zhou

Nan Yang

Furu Wei

Chuanqi Tan

Previous research on automated question gen-001 eration has almost exclusively focused on gen-002 erating factoid questions whose answers ca… (voir plus)n 003 be extracted from a single document. How-004 ever, there is an increasing interest in develop-005 ing systems that are capable of more complex 006 multi-hop question generation (QG), where an-007 swering the question requires reasoning over 008 multiple documents. In this work, we pro-009 pose a simple and effective approach based on 010 the transformer model for multi-hop QG. Our 011 approach consists of specialized input repre-012 sentations, a supporting sentence classiﬁcation 013 objective, and training data weighting. Prior 014 work on multi-hop QG considers the simpli-015 ﬁed setting of shorter documents and also ad-016 vocates the use of entity-based graph struc-017 tures as essential ingredients in model design. 018 On the contrary, we showcase that our model 019 can scale to the challenging setting of longer 020 documents as input, does not rely on graph 021 structures, and substantially outperforms the 022 state-of-the-art approaches as measured by au-023 tomated metrics and human evaluation. 024

2021-01-01

(publié)

www.semanticscholar.org

SPE: Symmetrical Prompt Enhancement for Factual Knowledge Retrieval

James M. Crawford

Matthew L. Ginsberg

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Xavier Glorot

Antoine Bordes

Alex Graves

Abdel rahman Mohamed

Adi Haviv

Jonathan Berant

Amir Globerson

Chloe Kiddon

Pedro M. Domingos

Brian Lester

Rami Al-rfou'

Noah Constant. 2021

Pengfei Liu

Weizhe Yuan … (voir 6 de plus)

Jinlan Fu

Zhengbao Jiang

Xiao Liu

Yanan Zheng

Zhengxiao Du

Ming Ding

Pretrained language models (PLMs) have 001 been shown to accumulate factual knowledge 002 from their unsupervised pretraining proce-003 dure… (voir plus)s (Petroni et al., 2019). Prompting is an 004 effective way to query such knowledge from 005 PLMs. Recently, continuous prompt methods 006 have been shown to have a larger potential 007 than discrete prompt methods in generating ef-008 fective queries (Liu et al., 2021a). However, 009 these methods do not consider symmetry of 010 the task. In this work, we propose Symmet-011 rical Prompt Enhancement (SPE), a continu-012 ous prompt-based method for fact retrieval that 013 leverages the symmetry of the task. Our results 014 on LAMA, a popular fact retrieval dataset, 015 show signiﬁcant improvement of SPE over pre-016 vious prompt methods

Systematic generalisation with group invariant predictions

Faruk Ahmed

Harm van Seijen

Aaron Courville

We consider situations where the presence of dominant simpler correlations with the target variable in a training set can cause an SGD-train… (voir plus)ed neural network to be less reliant on more persistently correlating complex features. When the non-persistent, simpler correlations correspond to non-semantic background factors, a neural network trained on this data can exhibit dramatic failure upon encountering systematic distributional shift, where the correlating background features are recombined with different objects. We perform an empirical study on three synthetic datasets, showing that group invariance methods across inferred partitionings of the training set can lead to significant improvements at such test-time situations. We also suggest a simple invariance penalty, showing with experiments on our setups that it can perform better than alternatives. We find that even without assuming access to any systematically shifted validation sets, one can still find improvements over an ERM-trained reference model.

2021-01-01

ICLR (publié)

openreview.net

Tackling Situated Multi-Modal Task-Oriented Dialogs with a Single Transformer Model

−. i.eUT

R´ejean Ducharme

Pascal Vincent

Morgan Kaufmann

Yen-Chun Chen

Linjie Li

Licheng Yu

Matthew Henderson

Blaise Thomson

Ehsan Hosseini-Asl

Bryan McCann

Chien-Sheng Wu

Samuel Humeau

Kurt Shuster

Marie-Anne Lachaux

The Situated Interactive Multi-Modal Conver-001 sations (SIMMC) 2.0 aims to create virtual 002 shopping assistants that can accept complex 0… (voir plus)03 multi-modal inputs, i.e. visual appearances of 004 objects and user utterances. It consists of four 005 subtasks, multi-modal disambiguation (MM-006 Disamb), multi-modal coreference resolution 007 (MM-Coref), multi-modal dialog state tracking 008 (MM-DST), and response retrieval and genera-009 tion. While many task-oriented dialog systems 010 usually tackle each subtask separately, we pro-011 pose a jointly learned encoder-decoder that per-012 forms all four subtasks at once for efficiency. 013 Moreover, we handle the multi-modality of the 014 challenge by representing visual objects as spe-015 cial tokens whose joint embedding is learned 016 via auxiliary tasks. This approach won the MM-017 Coref and response retrieval subtasks and nom-018 inated runner-up for the remaining subtasks 019 using a single unified model. In particular, 020 our model achieved 81.5% MRR, 71.2% R@1, 021 95.0% R@5, 98.2% R@10, and 1.9 mean rank 022 in response retrieval task, setting a high bar for 023 the state-of-the-art result in the SIMMC 2.0 024 track of the Dialog Systems Technology Chal-025 lenge 10 (DSTC10). 026

Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond

2021-01-01

arXiv.org (prépublication)

dblp.uni-trier.de

What Makes Machine Reading Comprehension Questions Difﬁcult? Investigating Variation in Passage Sources and Question Types

Susan Bartlett

Grzegorz Kondrak

Max Bartolo

Alastair Roberts

Johannes Welbl

Steven Bird

Ewan Klein

Edward Loper

Samuel R. Bowman

George Dahl. 2021

What

Chao Pang

Junyuan Shang

Jiaxiang Liu

Xuyi Chen

Yanbin Zhao

Yuxiang Lu

Weixin Liu

Zhi-901 hua Wu

Weibao Gong … (voir 21 de plus)

Jianzhong Liang

Zhizhou Shang

Peng Sun

Ouyang Xuan

Dianhai

Houwen Tian

Hua Wu

Haifeng Wang

Adam Trischler

Tong Wang

Xingdi Yuan

Justin Har-908

Alessandro Sordoni

Philip Bachman

Adina Williams

Nikita Nangia

Zhilin Yang

Peng Qi

Saizheng Zhang

ing. In

For a natural language understanding bench-001 mark to be useful in research, it has to con-002 sist of examples that are diverse and difﬁ… (voir plus)-003 cult enough to discriminate among current and 004 near-future state-of-the-art systems. However, 005 we do not yet know how best to select pas-006 sages to collect a variety of challenging exam-007 ples. In this study, we crowdsource multiple-008 choice reading comprehension questions for 009 passages taken from seven qualitatively dis-010 tinct sources, analyzing what attributes of pas-011 sages contribute to the difﬁculty and question 012 types of the collected examples. To our sur-013 prise, we ﬁnd that passage source, length, and 014 readability measures do not signiﬁcantly affect 015 question difﬁculty. Through our manual anno-016 tation of seven reasoning types, we observe 017 several trends between passage sources and 018 reasoning types, e.g., logical reasoning is more 019 often required in questions written for techni-020 cal passages. These results suggest that when 021 creating a new benchmark dataset, selecting a 022 diverse set of passages can help ensure a di-023 verse range of question types, but that passage 024 difﬁculty need not be a priority. 025

Machine Learning for Glacier Monitoring in the Hindu Kush Himalaya

Shimaa Baraka

Benjamin Akera

Bibek Aryal

Tenzing Chogyal Sherpa

Finu Shresta

Anthony Ortiz

Kris Sankaran

J. Ferres

M. Matin

2020-12-09

ArXiv (prépublication)

arxiv.org

Inductive biases for deep learning of higher-level cognition

Anirudh Goyal

A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopaedic list of … (voir plus)heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behaviour of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans’ abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.

2020-11-30

ArXiv (preprint)

doi.org

arxiv.org

RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

Cheng-Hao Liu

Maksym Korablyov

Stanisław Jastrzębski

Paweł Włodarczyk-Pruszyński

Marwin Segler

De novo molecule generation often results in chemically unfeasible molecules. A natural idea to mitigate this problem is to bias the search … (voir plus)process towards more easily synthesizable molecules using a proxy for synthetic accessibility. However, using currently available proxies still results in highly unrealistic compounds. We investigate the feasibility of training deep graph neural networks to approximate the outputs of a retrosynthesis planning software, and their use to bias the search process. We evaluate our method on a benchmark involving searching for drug-like molecules with antibiotic properties. Compared to enumerating over five million existing molecules from the ZINC database, our approach finds molecules predicted to be more likely to be antibiotics while maintaining good drug-like properties and being easily synthesizable. Importantly, our deep neural network can successfully filter out hard to synthesize molecules while achieving a

2020-11-25

ArXiv (prépublication)

arxiv.org

Perceptual Generative Autoencoders

Zijun Zhang

Ruixiang Zhang

Zongpeng Li