Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Scientific Director, Leadership Team
Observer, Board of Directors, Mila

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Research Intern - Université de Montréal
PhD - Université de Montréal
Research Intern - Université du Québec à Rimouski
Professional Master's - Université de Montréal
Independent visiting researcher
Co-supervisor :
Independent visiting researcher - UQAR
PhD - Université de Montréal
Independent visiting researcher - MIT
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating researcher - Université Paris-Saclay
Principal supervisor :
PhD - Université de Montréal
PhD - Massachusetts Institute of Technology
PhD - Université de Montréal
PhD - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Collaborating researcher
Postdoctorate - Université de Montréal
Co-supervisor :
Independent visiting researcher - Technical University Munich (TUM)
PhD - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Co-supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Collaborating Alumni
Research Intern - Université de Montréal
Professional Master's - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Research Intern - McGill University
Research Intern - Imperial College London
PhD - Université de Montréal
Research Intern - Université de Montréal
Collaborating Alumni - Université de Montréal
DESS - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Professional Master's - Université de Montréal
Independent visiting researcher - Université de Montréal
Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)
Collaborating researcher - Ying Wu Coll of Computing
Professional Master's - Université de Montréal
Undergraduate - Université de Montréal
PhD - Max-Planck-Institute for Intelligent Systems
Professional Master's - Université de Montréal
Independent visiting researcher - Université de Montréal
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Professional Master's - Université de Montréal
Independent visiting researcher - Technical University of Munich
PhD - École Polytechnique Montréal Fédérale de Lausanne
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - Valence
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Principal supervisor :
PhD - Université de Montréal
Professional Master's - Université de Montréal
Collaborating Alumni - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :

Publications

Discrete, compositional, and symbolic representations through attractor dynamics
Andrew Nam
Eric Elmoznino
Nikolay Malkin
Chen Sun
Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite ca… (see more)pacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractor-supported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information.
Learning to Scale Logits for Temperature-Conditional GFlowNets
Minsu Kim
Joohwan Ko
Dinghuai Zhang
Ling Pan
Taeyoung Yun
Woo Chang Kim
Jinkyoo Park
GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular gr… (see more)aphs. They are trained with the objective of sampling such objects with probability proportional to the object's reward. Among GFlowNets, the temperature-conditional GFlowNets represent a family of policies indexed by temperature, and each is associated with the correspondingly tempered reward function. The major benefit of temperature-conditional GFlowNets is the controllability of GFlowNets' exploration and exploitation through adjusting temperature. We propose a \textit{Learning to Scale Logits for temperature-conditional GFlowNets} (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.
Multi-Fidelity Active Learning with GFlowNets
Alex Hernandez-Garcia
Nikita Saxena
Moksh J. Jain
Cheng-Hao Liu
In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanw… (see more)hile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive. Progress in machine learning methods that can efficiently tackle such problems would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose the use of GFlowNets for multi-fidelity active learning, where multiple approximations of the black-box function are available at lower fidelity and cost. GFlowNets are recently proposed methods for amortised probabilistic inference that have proven efficient for exploring large, high-dimensional spaces and can hence be practical in the multi-fidelity setting too. Here, we describe our algorithm for multi-fidelity active learning with GFlowNets and evaluate its performance in both well-studied synthetic tasks and practically relevant applications of molecular discovery. Our results show that multi-fidelity active learning with GFlowNets can efficiently leverage the availability of multiple oracles with different costs and fidelities to accelerate scientific discovery and engineering design.
On the importance of catalyst-adsorbate 3D interactions for relaxed energy predictions
Alvaro Carbonero
Alexandre AGM Duval
Victor Schmidt
Santiago Miret
Alex Hernandez-Garcia
The use of machine learning for material property prediction and discovery has traditionally centered on graph neural networks that incorpor… (see more)ate the geometric configuration of all atoms. However, in practice not all this information may be readily available, e.g.~when evaluating the potentially unknown binding of adsorbates to catalyst. In this paper, we investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate with respect to the electro-catalyst. We consider SchNet, DimeNet++ and FAENet as base architectures and measure the impact of four modifications on model performance: removing edges in the input graph, pooling independent representations, not sharing the backbone weights and using an attention mechanism to propagate non-geometric relative information. We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE. Our work suggests future research directions in accelerated materials discovery where information on reactant configurations can be reduced or altogether omitted.
Towards equilibrium molecular conformation generation with GFlowNets
Alexandra Volokhova
Michał Koziarski
Alex Hernandez-Garcia
Cheng-Hao Liu
Santiago Miret
Pablo Lemos
Luca Thiede
Zichao Yan
Alán Aspuru-Guzik
Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this pa… (see more)per we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.
Causal machine learning for single-cell genomics
Alejandro Tejada-Lapuerta
Paul Bertin
Stefan Bauer
Hananeh Aliee
Fabian J. Theis
A community effort in SARS-CoV-2 drug discovery.
Johannes Schimunek
Philipp Seidl
Katarina Elez
Tim Hempel
Tuan Le
Frank Noé
Simon Olsson
Lluís Raich
Robin Winter
Hatice Gokcan
Filipp Gusev
Evgeny M. Gutkin
Olexandr Isayev
Maria G. Kurnikova
Chamali H. Narangoda
Roman Zubatyuk
Ivan P. Bosko
Konstantin V. Furs
Anna D. Karpenko
Yury V. Kornoushenko … (see 133 more)
Mikita Shuldau
Artsemi Yushkevich
Mohammed B. Benabderrahmane
Patrick Bousquet‐Melou
Ronan Bureau
Beatrice Charton
Bertrand C. Cirou
Gérard Gil
William J. Allen
Suman Sirimulla
Stanley Watowich
Nick Antonopoulos
Nikolaos Epitropakis
Agamemnon Krasoulis
Vassilis Pitsikalis
Stavros Theodorakis
Igor Kozlovskii
Anton Maliutin
Alexander Medvedev
Petr Popov
Mark Zaretckii
Hamid Eghbal‐Zadeh
Christina Halmich
Sepp Hochreiter
Andreas Mayr
Peter Ruch
Michael Widrich
Francois Berenger
Ashutosh Kumar
Yoshihiro Yamanishi
Kam Y. J. Zhang
Emmanuel Bengio
Moksh J. Jain
Maksym Korablyov
Cheng-Hao Liu
Gilles Marcou
Marcous Gilles
Enrico Glaab
Kelly Barnsley
Suhasini M. Iyengar
Mary Jo Ondrechen
V. Joachim Haupt
Florian Kaiser
Michael Schroeder
Luisa Pugliese
Simone Albani
Christina Athanasiou
Andrea Beccari
Paolo Carloni
Giulia D'Arrigo
Eleonora Gianquinto
Jonas Goßen
Anton Hanke
Benjamin P. Joseph
Daria B. Kokh
Sandra Kovachka
Candida Manelfi
Goutam Mukherjee
Abraham Muñiz‐Chicharro
Francesco Musiani
Ariane Nunes‐Alves
Giulia Paiardi
Giulia Rossetti
S. Kashif Sadiq
Francesca Spyrakis
Carmine Talarico
Alexandros Tsengenes
Rebecca C. Wade
Conner Copeland
Jeremiah Gaiser
Daniel R. Olson
Amitava Roy
Vishwesh Venkatraman
Travis J. Wheeler
Haribabu Arthanari
Klara Blaschitz
Marco Cespugli
Vedat Durmaz
Konstantin Fackeldey
Patrick D. Fischer
Christoph Gorgulla
Christian Gruber
Karl Gruber
Michael Hetmann
Jamie E. Kinney
Krishna M. Padmanabha Das
Shreya Pandita
Amit Singh
Georg Steinkellner
Guilhem Tesseyre
Gerhard Wagner
Zi‐Fu Wang
Ryan J. Yust
Dmitry S. Druzhilovskiy
Dmitry A. Filimonov
Pavel V. Pogodin
Vladimir Poroikov
Anastassia V. Rudik
Leonid A. Stolbov
Alexander V. Veselovsky
Maria De Rosa
Giada De Simone
Maria R. Gulotta
Jessica Lombino
Nedra Mekni
Ugo Perricone
Arturo Casini
Amanda Embree
D. Benjamin Gordon
David Lei
Katelin Pratt
Christopher A. Voigt
Kuang‐Yu Chen
Yves Jacob
Tim Krischuns
Pierre Lafaye
Agnès Zettor
M. Luis Rodríguez
Kris M. White
Daren Fearon
Frank Von Delft
Martin A. Walsh
Dragos Horvath
Charles L. Brooks
Babak Falsafi
Bryan Ford
Adolfo García‐Sastre
Sang Yup Lee
Nadia Naffakh
Alexandre Varnek
Günter Klambauer
Thomas M. Hermans
The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availabili… (see more)ty of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against Covid-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.
A cry for help: Early detection of brain injury in newborns
Charles Onu
Samantha Latremouille
Arsenii Gorin
Junhao Wang
Uchenna Ekwochi
P. Ubuane
O. Kehinde
Muhammad A. Salisu
Datonye Briggs
Crystal-GFN: sampling crystals with desirable properties and constraints
Alex Hernandez-Garcia
Alexandre AGM Duval
Alexandra Volokhova
Divya Sharma
pierre luc carrier
Michał Koziarski
Victor Schmidt
Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such … (see more)as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials, namely the space group, composition and lattice parameters. This domain-inspired approach enables the flexible incorporation of physical and structural hard constraints, as well as the use of any available predictive model of a desired physicochemical property as an objective function. To design stable materials, one must target the candidates with the lowest formation energy. Here, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench. The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.
Causal Inference in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large Systems
Trang Nguyen
Alexander Tong
Kanika Madan
Dianbo Liu
Understanding causal relationships within Gene Regulatory Networks (GRNs) is essential for unraveling the gene interactions in cellular proc… (see more)esses. However, causal discovery in GRNs is a challenging problem for multiple reasons including the existence of cyclic feedback loops and uncertainty that yields diverse possible causal structures. Previous works in this area either ignore cyclic dynamics (assume acyclic structure) or struggle with scalability. We introduce Swift-DynGFN as a novel framework that enhances causal structure learning in GRNs while addressing scalability concerns. Specifically, Swift-DynGFN exploits gene-wise independence to boost parallelization and to lower computational cost. Experiments on real single-cell RNA velocity and synthetic GRN datasets showcase the advancement in learning causal structure in GRNs and scalability in larger systems.
Local Search GFlowNets
Minsu Kim
Taeyoung Yun
Emmanuel Bengio
Dinghuai Zhang
Sungsoo Ahn
Jinkyoo Park
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their re… (see more)wards. GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space. This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue. Our main idea is to explore the local neighborhood via backtracking and reconstruction guided by backward and forward policies, respectively. This allows biasing the samples toward high-reward solutions, which is not possible for a typical GFlowNet solution generation scheme, which uses the forward policy to generate the solution from scratch. Extensive experiments demonstrate a remarkable performance improvement in several biochemical tasks. Source code is available: https://github.com/dbsxodud-11/ls_gfn.
Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks
Luca Scimeca
Alexander Rubinstein
Armand Nicolicioiu
Damien Teney
Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to shortcut learning phenomena, where… (see more) a model may rely on erroneous, easy-to-learn, cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs). We discover that DPMs have the inherent capability to represent multiple visual cues independently, even when they are largely correlated in the training data. We leverage this characteristic to encourage model diversity and empirically show the efficacy of the approach with respect to several diversification objectives. We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.