Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Mohammed Abukalam

Collaborating Alumni - Université de Montréal

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Stefan Bauer

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Paul Bertin

PhD - Université de Montréal

Joyce Chai

Independent visiting researcher

Principal supervisor :

Siva Reddy

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

David Rolnick

Xiaoyin Chen

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

Collaborating Alumni - Université de Montréal

PhD - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

Loubna Benabbou

Desmond Elliott

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

Leo Feng

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Edward Hu

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernandez-Garcia

Salem Lahlou

Collaborating Alumni - Université de Montréal

Tabitha Edith Lee

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Zhen Liu

Collaborating Alumni - Université de Montréal

Principal supervisor :

Liam Paull

Kanika Madan

PhD - Université de Montréal

Nikolay Malkin

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Postdoctorate - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Nassim Rahaman

Collaborating Alumni - Max-Planck-Institute for Intelligent Systems

Amine RAZIG

Collaborating researcher - Université de Montréal

Co-supervisor :

Loubna Benabbou

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

James Requeima

Independent visiting researcher - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Abhik Roychoudhury Roychoudhury

Independent visiting researcher

Principal supervisor :

Siva Reddy

Luca Scimeca

Postdoctorate - Université de Montréal

Collaborating Alumni - Université de Montréal

Marcin Sendera

Collaborating Alumni - Université de Montréal

Divya Sharma

Postdoctorate

Co-supervisor :

Alex Hernandez-Garcia

Mélisande Astrid Crystal Teng

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Independent visiting researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher

Collaborating researcher - Université de Montréal

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

Saliency is a Possible Red Herring When Diagnosing Poor Generalization

Joseph D Viviano

Becks Simpson

Francis Dutil

Joseph Paul Cohen

Poor generalization is one symptom of models that learn to predict target variables using spuriously-correlated image features present only … (see more)in the training distribution instead of the true image features that denote a class. It is often thought that this can be diagnosed visually using attribution (aka saliency) maps. We study if this assumption is correct. In some prediction tasks, such as for medical images, one may have some images with masks drawn by a human expert, indicating a region of the image containing relevant information to make the prediction. We study multiple methods that take advantage of such auxiliary labels, by training networks to ignore distracting features which may be found outside of the region of interest. This mask information is only used during training and has an impact on generalization accuracy depending on the severity of the shift between the training and test distributions. Surprisingly, while these methods improve generalization performance in the presence of a covariate shift, there is no strong correspondence between the correction of attribution towards the features a human expert have labelled as important and generalization performance. These results suggest that the root cause of poor generalization may not always be spatially defined, and raise questions about the utility of masks as 'attribution priors' as well as saliency maps for explainable predictions.

2021-01-01

ICLR (published)

openreview.net

Seeing things or seeing scenes: Investigating the capabilities of V&L models to align scene descriptions to images

Matt D Anderson

Erich W Graf

James H Elder

Peter Anderson

Xiaodong He

Chris Buehler

Mark Teney

Stephen Johnson

Gould Lei

Emily M. Bender

Timnit Gebru

Angelina McMillan-575

Alexander Koller. 2020

Climb-582

Yonatan Bisk

Ari Holtzman

Jesse Thomason

Joyce Chai

Angeliki Lazaridou … (see 32 more)

Jonathan May

Aleksandr

Thomas Unterthiner

Mostafa Dehghani

Georg Minderer

Sylvain Heigold

Jakob Gelly

Uszkoreit Neil

Houlsby. 2020

Lisa Anne Hendricks

Gabriel Ilharco

Rowan Zellers

Ali Farhadi

John M. Henderson

Contextual

Thomas L. Grifﬁths. 2021

Are Convolutional

Neu-827

Melissa L.-H. Võ

Jeremy M. Wolfe

Differen-830

Jianfeng Wang

Xiaowei Hu

Xiu-834 Pengchuan Zhang

Roy Schwartz

Bolei Zhou

Àgata Lapedriza

Jianxiong Xiao

Hang Zhao

Xavier Puig

Sanja Fidler

Images can be described in terms of the objects 001 they contain, or in terms of the types of scene 002 or place that they instantiate. In t… (see more)his paper we 003 address to what extent pretrained Vision and 004 Language models can learn to align descrip-005 tions of both types with images. We com-006 pare 3 state-of-the-art models, VisualBERT, 007 LXMERT and CLIP. We ﬁnd that (i) V&L 008 models are susceptible to stylistic biases ac-009 quired during pretraining; (ii) only CLIP per-010 forms consistently well on both object-and 011 scene-level descriptions. A follow-up ablation 012 study shows that CLIP uses object-level infor-013 mation in the visual modality to align with 014 scene-level textual descriptions

A Simple and Effective Model for Multi-Hop Question Generation

Jimmy Lei Ba

Jamie Ryan Kiros

Geoffrey E Hin-602

Peter W. Battaglia

Jessica Blake

Chandler Hamrick

Vic-613 tor Bapst

Alvaro Sanchez

Vinicius Zambaldi

M. Malinowski

Andrea Tacchetti

David Raposo

Tom B. Brown

Benjamin Mann

Nick Ryder

Melanie Subbiah

Jared Kaplan

Prafulla Dhariwal

Arvind Neelakantan

Pranav Shyam … (see 72 more)

Girish Sastry

William L. Hamilton

Clutrr

Nitish Srivastava

Geoffrey Hinton

Alex Krizhevsky

Ilya Sutskever

Ruslan Salakhutdinov. 2014

Gabriel Stanovsky

Julian Michael

Luke Zettlemoyer

Dan Su

Yan Xu

Wenliang Dai

Ziwei Ji

Tiezheng Yu

Minghao Tu

Kevin Huang

Guangtao Wang

Jing Huang

Ashish Vaswani

Noam M. Shazeer

Niki Parmar

Jakob Uszkoreit

Llion Jones

Aidan N. Gomez

Łukasz Kaiser

Illia Polosukhin. 2017

Attention

Petar Veliˇckovi´c

Guillem Cucurull

Arantxa Casanova

Adriana Romero Soriano

Pietro Lio’

Johannes Welbl

Pontus Stenetorp

Yonghui Wu

Mike Schuster

Quoc Zhifeng Chen

Mohammad Le

Wolfgang Norouzi

Macherey

M. Krikun

Yuan Cao

Qin Gao

William W. Cohen

Jianxing Yu

Xiaojun Quan

Qinliang Su

Jian Yin

Yuyu Zhang

Hanjun Dai

Zornitsa Kozareva

Cheng Zhao

Chenyan Xiong

Corby Rosset

Xia

Paul Song

Bennett Saurabh

Tiwary

Yao Zhao

Xiaochuan Ni

Yuanyuan Ding

Qingyu Zhou

Nan Yang

Furu Wei

Chuanqi Tan

Previous research on automated question gen-001 eration has almost exclusively focused on gen-002 erating factoid questions whose answers ca… (see more)n 003 be extracted from a single document. How-004 ever, there is an increasing interest in develop-005 ing systems that are capable of more complex 006 multi-hop question generation (QG), where an-007 swering the question requires reasoning over 008 multiple documents. In this work, we pro-009 pose a simple and effective approach based on 010 the transformer model for multi-hop QG. Our 011 approach consists of specialized input repre-012 sentations, a supporting sentence classiﬁcation 013 objective, and training data weighting. Prior 014 work on multi-hop QG considers the simpli-015 ﬁed setting of shorter documents and also ad-016 vocates the use of entity-based graph struc-017 tures as essential ingredients in model design. 018 On the contrary, we showcase that our model 019 can scale to the challenging setting of longer 020 documents as input, does not rely on graph 021 structures, and substantially outperforms the 022 state-of-the-art approaches as measured by au-023 tomated metrics and human evaluation. 024

2021-01-01

(published)

www.semanticscholar.org

SPE: Symmetrical Prompt Enhancement for Factual Knowledge Retrieval

James M. Crawford

Matthew L. Ginsberg

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Xavier Glorot

Antoine Bordes

Alex Graves

Abdel rahman Mohamed

Adi Haviv

Jonathan Berant

Amir Globerson

Chloe Kiddon

Pedro M. Domingos

Brian Lester

Rami Al-rfou'

Noah Constant. 2021

Pengfei Liu

Weizhe Yuan … (see 6 more)

Jinlan Fu

Zhengbao Jiang

Xiao Liu

Yanan Zheng

Zhengxiao Du

Ming Ding

Pretrained language models (PLMs) have 001 been shown to accumulate factual knowledge 002 from their unsupervised pretraining proce-003 dure… (see more)s (Petroni et al., 2019). Prompting is an 004 effective way to query such knowledge from 005 PLMs. Recently, continuous prompt methods 006 have been shown to have a larger potential 007 than discrete prompt methods in generating ef-008 fective queries (Liu et al., 2021a). However, 009 these methods do not consider symmetry of 010 the task. In this work, we propose Symmet-011 rical Prompt Enhancement (SPE), a continu-012 ous prompt-based method for fact retrieval that 013 leverages the symmetry of the task. Our results 014 on LAMA, a popular fact retrieval dataset, 015 show signiﬁcant improvement of SPE over pre-016 vious prompt methods

Systematic generalisation with group invariant predictions

Faruk Ahmed

Harm van Seijen

Aaron Courville

We consider situations where the presence of dominant simpler correlations with the target variable in a training set can cause an SGD-train… (see more)ed neural network to be less reliant on more persistently correlating complex features. When the non-persistent, simpler correlations correspond to non-semantic background factors, a neural network trained on this data can exhibit dramatic failure upon encountering systematic distributional shift, where the correlating background features are recombined with different objects. We perform an empirical study on three synthetic datasets, showing that group invariance methods across inferred partitionings of the training set can lead to significant improvements at such test-time situations. We also suggest a simple invariance penalty, showing with experiments on our setups that it can perform better than alternatives. We find that even without assuming access to any systematically shifted validation sets, one can still find improvements over an ERM-trained reference model.

2021-01-01

ICLR (published)

openreview.net

Tackling Situated Multi-Modal Task-Oriented Dialogs with a Single Transformer Model

−. i.eUT

R´ejean Ducharme

Pascal Vincent

Morgan Kaufmann

Yen-Chun Chen

Linjie Li

Licheng Yu

Matthew Henderson

Blaise Thomson

Ehsan Hosseini-Asl

Bryan McCann

Chien-Sheng Wu

Samuel Humeau

Kurt Shuster

Marie-Anne Lachaux

The Situated Interactive Multi-Modal Conver-001 sations (SIMMC) 2.0 aims to create virtual 002 shopping assistants that can accept complex 0… (see more)03 multi-modal inputs, i.e. visual appearances of 004 objects and user utterances. It consists of four 005 subtasks, multi-modal disambiguation (MM-006 Disamb), multi-modal coreference resolution 007 (MM-Coref), multi-modal dialog state tracking 008 (MM-DST), and response retrieval and genera-009 tion. While many task-oriented dialog systems 010 usually tackle each subtask separately, we pro-011 pose a jointly learned encoder-decoder that per-012 forms all four subtasks at once for efficiency. 013 Moreover, we handle the multi-modality of the 014 challenge by representing visual objects as spe-015 cial tokens whose joint embedding is learned 016 via auxiliary tasks. This approach won the MM-017 Coref and response retrieval subtasks and nom-018 inated runner-up for the remaining subtasks 019 using a single unified model. In particular, 020 our model achieved 81.5% MRR, 71.2% R@1, 021 95.0% R@5, 98.2% R@10, and 1.9 mean rank 022 in response retrieval task, setting a high bar for 023 the state-of-the-art result in the SIMMC 2.0 024 track of the Dialog Systems Technology Chal-025 lenge 10 (DSTC10). 026

Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

What Makes Machine Reading Comprehension Questions Difﬁcult? Investigating Variation in Passage Sources and Question Types

Susan Bartlett

Grzegorz Kondrak

Max Bartolo

Alastair Roberts

Johannes Welbl

Steven Bird

Ewan Klein

Edward Loper

Samuel R. Bowman

George Dahl. 2021

What

Chao Pang

Junyuan Shang

Jiaxiang Liu

Xuyi Chen

Yanbin Zhao

Yuxiang Lu

Weixin Liu

Zhi-901 hua Wu

Weibao Gong … (see 21 more)

Jianzhong Liang

Zhizhou Shang

Peng Sun

Ouyang Xuan

Dianhai

Houwen Tian

Hua Wu

Haifeng Wang

Adam Trischler

Tong Wang

Xingdi Yuan

Justin Har-908

Alessandro Sordoni

Philip Bachman

Adina Williams

Nikita Nangia

Zhilin Yang

Peng Qi

Saizheng Zhang

ing. In

For a natural language understanding bench-001 mark to be useful in research, it has to con-002 sist of examples that are diverse and difﬁ… (see more)-003 cult enough to discriminate among current and 004 near-future state-of-the-art systems. However, 005 we do not yet know how best to select pas-006 sages to collect a variety of challenging exam-007 ples. In this study, we crowdsource multiple-008 choice reading comprehension questions for 009 passages taken from seven qualitatively dis-010 tinct sources, analyzing what attributes of pas-011 sages contribute to the difﬁculty and question 012 types of the collected examples. To our sur-013 prise, we ﬁnd that passage source, length, and 014 readability measures do not signiﬁcantly affect 015 question difﬁculty. Through our manual anno-016 tation of seven reasoning types, we observe 017 several trends between passage sources and 018 reasoning types, e.g., logical reasoning is more 019 often required in questions written for techni-020 cal passages. These results suggest that when 021 creating a new benchmark dataset, selecting a 022 diverse set of passages can help ensure a di-023 verse range of question types, but that passage 024 difﬁculty need not be a priority. 025

Machine Learning for Glacier Monitoring in the Hindu Kush Himalaya

Shimaa Baraka

Benjamin Akera

Bibek Aryal

Tenzing Chogyal Sherpa

Finu Shresta

Anthony Ortiz

Kris Sankaran

J. Ferres

M. Matin

2020-12-09

ArXiv (preprint)

arxiv.org

Inductive biases for deep learning of higher-level cognition

Anirudh Goyal

A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopaedic list of … (see more)heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behaviour of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans’ abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.

2020-11-30

ArXiv (preprint)

doi.org

arxiv.org

RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

Cheng-Hao Liu

Maksym Korablyov

Stanisław Jastrzębski

Paweł Włodarczyk-Pruszyński

Marwin Segler

De novo molecule generation often results in chemically unfeasible molecules. A natural idea to mitigate this problem is to bias the search … (see more)process towards more easily synthesizable molecules using a proxy for synthetic accessibility. However, using currently available proxies still results in highly unrealistic compounds. We investigate the feasibility of training deep graph neural networks to approximate the outputs of a retrosynthesis planning software, and their use to bias the search process. We evaluate our method on a benchmark involving searching for drug-like molecules with antibiotic properties. Compared to enumerating over five million existing molecules from the ZINC database, our approach finds molecules predicted to be more likely to be antibiotics while maintaining good drug-like properties and being easily synthesizable. Importantly, our deep neural network can successfully filter out hard to synthesize molecules while achieving a

2020-11-25

ArXiv (preprint)

arxiv.org

Perceptual Generative Autoencoders

Zijun Zhang

Ruixiang Zhang

Zongpeng Li