Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Marie-Josée Beauchamp, Administrative Assistant at marie-josee.beauchamp@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Mohammed Abukalam

Collaborating Alumni - Université de Montréal

agassoussisalwane2@gmail.com

Salwane Agassoussi

Université de Montréal

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Ayoub Atanane

Collaborating Alumni - Université du Québec à Rimouski

Stefan Bauer

Independent visiting researcher

Co-supervisor :

Guillaume Lajoie

Paul Bertin

PhD - Université de Montréal

Ghait Boukachab

Collaborating Alumni - UQAR

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

David Rolnick

Xiaoyin Chen

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

PhD - Université de Montréal

Collaborating Alumni - Université de Montréal

Eric Elmoznino

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Léna Ezzine

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

Co-supervisor :

Leo Feng

PhD - Université de Montréal

Research Intern - Université de Montréal

Ivan Grega

Research Intern - Université de Montréal

Pietro Greiner

PhD

Mohsin Hasan

PhD - Université de Montréal

mohsin.hasan@mila.quebec

Edward Hu

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

moksh.jain@mila.quebec

Master's Research - Université de Montréal

Co-supervisor :

Collaborating Alumni - Université de Montréal

Minsu Kim

Research Intern - Université de Montréal

Collaborating researcher - Université de Montréal

Michał Koziarski

Collaborating Alumni - Université de Montréal

Salem Lahlou

Collaborating Alumni - Université de Montréal

Seanie Lee

Collaborating Alumni - Université de Montréal

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Zhen Liu

Collaborating Alumni - Université de Montréal

Principal supervisor :

Liam Paull

Matt MacDermott

Collaborating Alumni - Imperial College London

PhD - Université de Montréal

Mohammed Mahfoud

Collaborating Alumni - Université de Montréal

Nikolay Malkin

Collaborating Alumni - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sören Mindermann

Collaborating researcher - Université de Montréal

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Postdoctorate - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

PhD - University of Waterloo

Principal supervisor :

David Rolnick

Nassim Rahaman

Collaborating Alumni - Max-Planck-Institute for Intelligent Systems

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

James Requeima

Independent visiting researcher - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Victor Schmidt

Collaborating Alumni - Université de Montréal

Postdoctorate - Université de Montréal

Master's Research - Université de Montréal

Marcin Sendera

Collaborating Alumni - Université de Montréal

Dounia Shaaban Kabakibo

Research Intern - Université de Montréal

Vedant Shah

Master's Research - Université de Montréal

Postdoctorate

Marco Stock

Independent visiting researcher - Technical University of Munich

marco.stock@tum.de

Mélisande Astrid Crystal Teng

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)

Principal supervisor :

David Rolnick

alexander.tong@mila.quebec

Alex Tong

Postdoctorate - Université de Montréal

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Zichao Yan

Collaborating Alumni - Université de Montréal

Omar G. Younis

Collaborating researcher

Collaborating researcher - KAIST

Nicole Zhang

PhD - McGill University

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Aaron Courville

Tianyu Zhang

PhD - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

PhD - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

Pruning for efficient hardware implementations of deep neural networks

Ghouthi Boukli Hacene

Vincent Gripon

Matthieu Arzel

Nicolas Farrugia

Continuous Domain Adaptation with Variational Domain-Agnostic Feature Replay

Qicheng Lao

Xiang Jiang

Mohammad Havaei

Learning in non-stationary environments is one of the biggest challenges in machine learning. Non-stationarity can be caused by either task … (see more)drift, i.e., the drift in the conditional distribution of labels given the input data, or the domain drift, i.e., the drift in the marginal distribution of the input data. This paper aims to tackle this challenge in the context of continuous domain adaptation, where the model is required to learn new tasks adapted to new domains in a non-stationary environment while maintaining previously learned knowledge. To deal with both drifts, we propose variational domain-agnostic feature replay, an approach that is composed of three components: an inference module that filters the input data into domain-agnostic representations, a generative module that facilitates knowledge transfer, and a solver module that applies the filtered and transferable knowledge to solve the queries. We address the two fundamental scenarios in continuous domain adaptation, demonstrating the effectiveness of our proposed approach for practical usage.

2020-03-09

ArXiv (preprint)

On the Morality of Artificial Intelligence

Alexandra Luccioni

Examines ethical principles and guidelines that surround machine learning and artificial intelligence.

2020-03-01

IEEE Technology and Society Magazine (published)

doi.org

On Catastrophic Interference in Atari 2600 Games

William Fedus

Dibya Ghosh

John D. Martin

Marc Gendron-Bellemare

Hugo Larochelle

Model-free deep reinforcement learning is sample inefficient. One hypothesis -- speculated, but not confirmed -- is that catastrophic interf… (see more)erence within an environment inhibits learning. We test this hypothesis through a large-scale empirical study in the Arcade Learning Environment (ALE) and, indeed, find supporting evidence. We show that interference causes performance to plateau; the network cannot train on segments beyond the plateau without degrading the policy used to reach there. By synthetically controlling for interference, we demonstrate performance boosts across architectures, learning algorithms and environments. A more refined analysis shows that learning one segment of a game often increases prediction errors elsewhere. Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.

2020-02-28

ArXiv (preprint)

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning

Devansh Arpit

Huan Wang

Caiming Xiong

Richard Socher

2020-02-20

ArXiv (preprint)

HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery

Michel Deudon

Alfredo Kalaitzis

Israel Goytom

Md Rifat Arefin

Zhichao Lin

Kris Sankaran

Vincent Michalski

Samira Ebrahimi Kahou

Julien Cornebise

Generative deep learning has sparked a new wave of Super-Resolution (SR) algorithms that enhance single images with impressive aesthetic res… (see more)ults, albeit with imaginary details. Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views. This is important for satellite monitoring of human impact on the planet -- from deforestation, to human rights violations -- that depend on reliable imagery. To this end, we present HighRes-net, the first deep learning approach to MFSR that learns its sub-tasks in an end-to-end fashion: (i) co-registration, (ii) fusion, (iii) up-sampling, and (iv) registration-at-the-loss. Co-registration of low-resolution views is learned implicitly through a reference-frame channel, with no explicit registration mechanism. We learn a global fusion operator that is applied recursively on an arbitrary number of low-resolution pairs. We introduce a registered loss, by learning to align the SR output to a ground-truth through ShiftNet. We show that by learning deep representations of multiple views, we can super-resolve low-resolution signals and enhance Earth Observation data at scale. Our approach recently topped the European Space Agency's MFSR competition on real-world satellite imagery.

2020-02-15

ArXiv (preprint)

Modeling Cloud Reflectance Fields using Conditional Generative Adversarial Networks

Victor Schmidt

Mustafa Alghali

Kris Sankaran

Tianle Yuan

We introduce a conditional Generative Adversarial Network (cGAN) approach to generate cloud reflectance fields (CRFs) conditioned on large s… (see more)cale meteorological variables such as sea surface temperature and relative humidity. We show that our trained model can generate realistic CRFs from the corresponding meteorological observations, which represents a step towards a data-driven framework for stochastic cloud parameterization.

2020-02-10

ArXiv (preprint)

Meta-learning framework with applications to zero-shot time-series forecasting

Boris Oreshkin

Dmitri Carpov

Nicolas Chapados

Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new… (see more) TS coming from different datasets? This work provides positive evidence to this using a broad meta-learning framework which we show subsumes many existing meta-learning algorithms. Our theoretical analysis suggests that residual connections act as a meta-learning adaptation mechanism, generating a subset of task-specific parameters based on a given TS input, thus gradually expanding the expressive power of the architecture on-the-fly. The same mechanism is shown via linearization analysis to have the interpretation of a sequential update of the final linear layer. Our empirical results on a wide range of data emphasize the importance of the identified meta-learning mechanisms for successful zero-shot univariate forecasting, suggesting that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining, resulting in performance that is at least as good as that of state-of-practice univariate forecasting models.

2020-02-07

ArXiv (preprint)

doi.org

Using Simulated Data to Generate Images of Climate Change

Gautier Cosne

Adrien Juraver

Mélisande Teng

Victor Schmidt

Vahe Vardanyan

Alexandra Luccioni

Generative adversarial networks (GANs) used in domain adaptation tasks have the ability to generate images that are both realistic and perso… (see more)nalized, transforming an input image while maintaining its identifiable characteristics. However, they often require a large quantity of training data to produce high-quality images in a robust way, which limits their usability in cases when access to data is limited. In our paper, we explore the potential of using images from a simulated 3D environment to improve a domain adaptation task carried out by the MUNIT architecture, aiming to use the resulting images to raise awareness of the potential future impacts of climate change.

2020-01-26

ArXiv (preprint)

COVI White Paper-Version 1.1

Hannah Alsdurf

Tristan Deleu

Prateek Gupta

Daphne Ippolito

Richard Janda

Max Jarvie

Tyler J. Kolody

Sekoul Krastev

Tegan Maharaj

Robert Obryk

Dan Pilat

Valerie Pisano

Benjamin Prud'homme

Meng Qu

Nasim Rahaman

Irina Rish

Jean-franois Rousseau

Abhinav Sharma

Brooke Struck … (see 3 more)

Jian Tang

Martin Weiss

Yun William Yu

The SARS-CoV-2 (Covid-19) pandemic has resulted in significant strain on health care and public health institutions around the world. Contac… (see more)t tracing is an essential tool for public health officials and local communities to change the course of the Covid-19 pandemic. Standard manual contact tracing of people infected with Covid-19, while the current gold standard, has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through the use of mobile applications has the potential to shift the paradigm of Covid-19 community spread. Although some countries have deployed centralized tracking systems through either GPS or Bluetooth, more privacy-protecting decentralized systems offer much of the same benefit without concentrating data in the hands of a state authority or in for-profit corporations. Additionally, machine learning methods can be used to circumvent some of the limitations of standard digital tracing by incorporating many clues (including medical conditions, self-reported symptoms, and numerous encounters with people at different risk levels, for different durations and distances) and their uncertainty into a more graded and precise estimation of infection and contagion risk. The estimated risk can be used to provide early risk awareness, personalized recommendations and relevant information to the user and connect them to health services. Finally, the non-identifying data about these risks can inform detailed epidemiological models trained jointly with the machine learning predictor, and these models can provide statistical evidence for the interaction and importance of different factors involved in the transmission of the disease. They can also be used to monitor, evaluate and optimize different health policy and confinement/deconfinement scenarios according to medical and economic productivity indicators. However, such a strategy based on mobile apps and machine learning should proactively mitigate potential ethical and privacy risks, which could have substantial impacts on society (not only impacts on health but also impacts such as stigmatization and abuse of personal data). Here, we present an overview of the rationale, design, ethical considerations and privacy strategy of ‘COVI,’ a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada. Addendum 2020-07-14: The government of Canada has declined to endorse COVI and will be promoting a different app for decentralized contact tracing. In the interest of preventing fragmentation of the app landscape, COVI will therefore not be deployed to end users. We are currently still in the process of finalizing the project, and plan to release our code and models for academic consumption and to make them accessible to other States should they wish to deploy an app based on or inspired by said code and models. University of Ottawa, Mila, Université de Montréal, The Alan Turing Institute, University of Oxford, University of Pennsylvania, McGill University, Borden Ladner Gervais LLP, The Decision Lab, HEC Montréal, Max Planck Institute, Libéo, University of Toronto. Corresponding author general: richard.janda@mcgill.ca Corresponding author for public health: abhinav.sharma@mcgill.ca Corresponding author for privacy: ywyu@math.toronto.edu Corresponding author for machine learning: yoshua.bengio@mila.quebec Corresponding author for user perspective: brooke@thedecisionlab.com Corresponding author for technical implementation: jean-francois.rousseau@libeo.com 1 ar X iv :2 00 5. 08 50 2v 2 [ cs .C R ] 2 7 Ju l 2 02 0

GraphMix: Improved Training of Graph Neural Networks for Semi-Supervised Learning

Vikas Verma

Meng Qu

Alex Lamb

Juho Kannala

Jian Tang

We present GraphMix , a regularized training scheme for Graph Neural Network based semi-supervised object classiﬁcation, leveraging the re… (see more)cent advances in the regularization of classical deep neural networks. Speciﬁcally, we pro-pose a uniﬁed approach in which we train a fully-connected network jointly with the graph neural network via parameter sharing, interpolation-based regularization and self-predicted-targets. Our proposed method is architecture agnostic in the sense that it can be applied to any variant of graph neural networks which applies a parametric transformation to the features of the graph nodes. Despite its simplicity, with GraphMix we can consistently improve results and achieve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets :Cora-Full, Co-author-CS and Co-author-Physics.

Hybrid Models for Learning to Branch

Prateek Gupta

Maxime Gasse

Elias Boutros Khalil

Pawan Mudigonda

M. Pawan Kumar

Andrea Lodi