Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

Loubna Benabbou

Desmond Elliott

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Minsu Kim

Research Intern - Université de Montréal

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Song LIU

Collaborating researcher - s.o.

Nikolay Malkin

Collaborating researcher - Université de Montréal

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Nassim Rahaman

Collaborating Alumni - Max-Planck-Institute for Intelligent Systems

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Dragos Secrieru

Collaborating Alumni - Université de Montréal

Divya Sharma

Postdoctorate

Co-supervisor :

Mélisande Astrid Crystal Teng

Vincent Taboga

Collaborating Alumni - Polytechnique Montréal

Co-supervisor :

Collaborating Alumni - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Collaborating researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher

Collaborating researcher - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Tianyu Zhang

PhD - Université de Montréal

PhD - McGill University

Principal supervisor :

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

A community effort in SARS-CoV-2 drug discovery.

Johannes Schimunek

Philipp Seidl

Katarina Elez

Tim Hempel

Tuan Le

Frank Noé

Simon Olsson

Lluís Raich

Robin Winter

Hatice Gokcan

Filipp Gusev

Evgeny M. Gutkin

Olexandr Isayev

Maria G. Kurnikova

Chamali H. Narangoda

Roman Zubatyuk

Ivan P. Bosko

Konstantin V. Furs

Anna D. Karpenko

Yury V. Kornoushenko … (see 133 more)

Mikita Shuldau

Artsemi Yushkevich

Mohammed B. Benabderrahmane

Patrick Bousquet‐Melou

Ronan Bureau

Beatrice Charton

Bertrand C. Cirou

Gérard Gil

William J. Allen

Suman Sirimulla

Stanley Watowich

Nick Antonopoulos

Nikolaos Epitropakis

Agamemnon Krasoulis

Vassilis Pitsikalis

Stavros Theodorakis

Igor Kozlovskii

Anton Maliutin

Alexander Medvedev

Petr Popov

Mark Zaretckii

Hamid Eghbal‐Zadeh

Christina Halmich

Sepp Hochreiter

Andreas Mayr

Peter Ruch

Michael Widrich

Francois Berenger

Ashutosh Kumar

Yoshihiro Yamanishi

Kam Y. J. Zhang

Emmanuel Bengio

Moksh J. Jain

Maksym Korablyov

Cheng-Hao Liu

Gilles Marcou

M. Gilles

Enrico Glaab

Kelly Barnsley

Suhasini M. Iyengar

Mary Jo Ondrechen

V. Joachim Haupt

Florian Kaiser

Michael Schroeder

Luisa Pugliese

Simone Albani

Christina Athanasiou

Andrea Beccari

Paolo Carloni

Giulia D'Arrigo

Eleonora Gianquinto

Jonas Goßen

Anton Hanke

Benjamin P. Joseph

Daria B. Kokh

Sandra Kovachka

Candida Manelfi

Goutam Mukherjee

Abraham Muñiz‐Chicharro

Francesco Musiani

Ariane Nunes‐Alves

Giulia Paiardi

Giulia Rossetti

S. Kashif Sadiq

Francesca Spyrakis

Carmine Talarico

Alexandros Tsengenes

Rebecca C. Wade

Conner Copeland

Jeremiah Gaiser

Daniel R. Olson

Amitava Roy

Vishwesh Venkatraman

Travis J. Wheeler

Haribabu Arthanari

Klara Blaschitz

Marco Cespugli

Vedat Durmaz

Konstantin Fackeldey

Patrick D. Fischer

Christoph Gorgulla

Christian Gruber

Karl Gruber

Michael Hetmann

Jamie E. Kinney

Krishna M. Padmanabha Das

Shreya Pandita

Amit Singh

Georg Steinkellner

Guilhem Tesseyre

Gerhard Wagner

Zi‐Fu Wang

Ryan J. Yust

Dmitry S. Druzhilovskiy

Dmitry A. Filimonov

Pavel V. Pogodin

Vladimir Poroikov

Anastassia V. Rudik

Leonid A. Stolbov

Alexander V. Veselovsky

Maria De Rosa

Giada De Simone

Maria R. Gulotta

Jessica Lombino

Nedra Mekni

Ugo Perricone

Arturo Casini

Amanda Embree

D. Benjamin Gordon

David Lei

Katelin Pratt

Christopher A. Voigt

Kuang‐Yu Chen

Yves Jacob

Tim Krischuns

Pierre Lafaye

Agnès Zettor

M. Luis Rodríguez

Kris M. White

Daren Fearon

Frank Von Delft

Martin A. Walsh

Dragos Horvath

Charles L. Brooks

Babak Falsafi

Bryan Ford

Adolfo García‐Sastre

Sang Yup Lee

Nadia Naffakh

Alexandre Varnek

Günter Klambauer

Thomas M. Hermans

The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availabili… (see more)ty of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against Covid-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.

2023-11-13

Molecular informatics (published)

SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

Mélisande Teng

Amna Elmustafa

Benjamin Akera

Hager Radi Abdelwahed

Hugo Larochelle

David Rolnick

2023-11-01

ArXiv (preprint)

Generative AI models should include detection mechanisms as a condition for public release

Alistair Knott

Dino Pedreschi

Raja Chatila

Tapabrata Chakraborti

Susan Leavy

Ricardo Baeza-Yates

David Eyers

Andrew Trotman

Paul D. Teal

Przemyslaw Biecek

Stuart Russell

The new wave of ‘foundation models’—general-purpose generative AI models, for production of text (e.g., ChatGPT) or images (e.g., MidJ… (see more)ourney)—represent a dramatic advance in the state of the art for AI. But their use also introduces a range of new risks, which has prompted an ongoing conversation about possible regulatory mechanisms. Here we propose a specific principle that should be incorporated into legislation: that any organization developing a foundation model intended for public use must demonstrate a reliable detection mechanism for the content it generates, as a condition of its public release. The detection mechanism should be made publicly available in a tool that allows users to query, for an arbitrary item of content, whether the item was generated (wholly or partly) by the model. In this paper, we argue that this requirement is technically feasible and would play an important role in reducing certain risks from new AI models in many domains. We also outline a number of options for the tool’s design, and summarize a number of points where further input from policymakers and researchers would be required.

2023-10-27

Ethics and Information Technology (published)

OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning

Pau Rodríguez

A key aspect of human intelligence is the ability to imagine -- composing learned concepts in novel ways -- to make sense of new scenarios. … (see more)Such capacity is not yet attained for machine learning systems. In this work, in the context of visual reasoning, we show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination. Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language. We show that our modular architectural choices can be used to generate new training tasks that lead to better out-of-distribution generalization. We compare our model to existing and new baselines in proposed visual reasoning benchmark that consists of applying arithmetic operations to MNIST digits.

2023-10-27

ArXiv (preprint)

arxiv.org

Attention Schema in Neural Agents

Dianbo Liu

Samuele Bolotta

Mike He Zhu

Zahra Sheikhbahaee

Guillaume Dumas

Attention has become a common ingredient in deep learning architectures. It adds a dynamical selection of information on top of the static s… (see more)election of information supported by weights. In the same way, we can imagine a higher-order informational filter built on top of attention: an Attention Schema (AS), namely, a descriptive and predictive model of attention. In cognitive neuroscience, Attention Schema Theory (AST) supports this idea of distinguishing attention from AS. A strong prediction of this theory is that an agent can use its own AS to also infer the states of other agents' attention and consequently enhance coordination with other agents. As such, multi-agent reinforcement learning would be an ideal setting to experimentally test the validity of AST. We explore different ways in which attention and AS interact with each other. Our preliminary results indicate that agents that implement the AS as a recurrent internal control achieve the best performance. In general, these exploratory experiments suggest that equipping artificial agents with a model of attention can enhance their social intelligence.

2023-10-26

NeurIPS.cc/2023/Workshop/InfoCog (poster)

Baking Symmetry into GFlowNets

George Ma

Emmanuel Bengio

Dinghuai Zhang

GFlowNets have exhibited promising performance in generating diverse candidates with high rewards. These networks generate objects increment… (see more)ally and aim to learn a policy that assigns probability of sampling objects in proportion to rewards. However, the current training pipelines of GFlowNets do not consider the presence of isomorphic actions, which are actions resulting in symmetric or isomorphic states. This lack of symmetry increases the amount of samples required for training GFlowNets and can result in inefficient and potentially incorrect flow functions. As a consequence, the reward and diversity of the generated objects decrease. In this study, our objective is to integrate symmetries into GFlowNets by identifying equivalent actions during the generation process. Experimental results using synthetic data demonstrate the promising performance of our proposed approaches.

2023-10-26

NeurIPS.cc/2023/Workshop/AI4Science (oral)

Causal Discovery in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large Systems

Trang Nguyen

Alexander Tong

Kanika Madan

Dianbo Liu

Understanding causal relationships within Gene Regulatory Networks (GRNs) is essential for unraveling the gene interactions in cellular proc… (see more)esses. However, causal discovery in GRNs is a challenging problem for multiple reasons including the existence of cyclic feedback loops and uncertainty that yields diverse possible causal structures. Previous works in this area either ignore cyclic dynamics (assume acyclic structure) or struggle with scalability. We introduce Swift-DynGFN as a novel framework that enhances causal structure learning in GRNs while addressing scalability concerns. Specifically, Swift-DynGFN exploits gene-wise independence to boost parallelization and to lower computational cost. Experiments on real single-cell RNA velocity and synthetic GRN datasets showcase the advancement in learning causal structure in GRNs and scalability in larger systems.

2023-10-26

NeurIPS.cc/2023/Workshop/GenBio (poster)

Crystal-GFN: sampling materials with desirable properties and constraints

Mistal

Alexandre AGM Duval

2023-10-26

NeurIPS.cc/2023/Workshop/AI4Mat (spotlight)

Discrete, compositional, and symbolic representations through attractor dynamics

Andrew Nam

Chen Sun

Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite ca… (see more)pacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractor-supported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information.

2023-10-26

NeurIPS.cc/2023/Workshop/InfoCog (oral)

On the importance of catalyst-adsorbate 3D interactions for relaxed energy predictions

Alvaro Carbonero

Alexandre Duval

Victor Schmidt

Santiago Miret

David Rolnick

The use of machine learning for material property prediction and discovery has traditionally centered on graph neural networks that incorpor… (see more)ate the geometric configuration of all atoms. However, in practice not all this information may be readily available, e.g.~when evaluating the potentially unknown binding of adsorbates to catalyst. In this paper, we investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate with respect to the electro-catalyst. We consider SchNet, DimeNet++ and FAENet as base architectures and measure the impact of four modifications on model performance: removing edges in the input graph, pooling independent representations, not sharing the backbone weights and using an attention mechanism to propagate non-geometric relative information. We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE. Our work suggests future research directions in accelerated materials discovery where information on reactant configurations can be reduced or altogether omitted.

2023-10-26

NeurIPS.cc/2023/Workshop/AI4Mat (poster)

Towards equilibrium molecular conformation generation with GFlowNets

Alexandra Volokhova

Michał Koziarski

Cheng-Hao Liu

Santiago Miret

Pablo Lemos

Luca Thiede

Zichao Yan

Alán Aspuru-Guzik

Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this pa… (see more)per we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.

2023-10-26

NeurIPS.cc/2023/Workshop/AI4Mat (poster)

Managing extreme AI risks amid rapid progress

Geoffrey Hinton

Andrew Yao

Dawn Song

Pieter Abbeel

Yuval Noah Harari

Trevor Darrell

Ya-Qin Zhang

Lan Xue

Shai Shalev-Shwartz

Gillian Hadfield

Jeff Clune

Tegan Maharaj

Frank Hutter

Atilim Güneş Baydin

Sheila McIlraith

Qiqi Gao

Ashwin Acharya

David Krueger

Anca Dragan … (see 5 more)

Philip Torr

Stuart Russell

Daniel Kahneman

Jan Brauner

Sören Mindermann

Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can aut… (see more)onomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI, there is a lack of consensus about how exactly such risks arise, and how to manage them. Society's response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems. In this short consensus paper, we describe extreme risks from upcoming, advanced AI systems. Drawing on lessons learned from other safety-critical technologies, we then outline a comprehensive plan combining technical research and development with proactive, adaptive governance mechanisms for a more commensurate preparation.

2023-10-25

Science (published)