Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Scientific Director, Leadership Team
Observer, Board of Directors, Mila

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Research Intern - Université de Montréal
PhD - Université de Montréal
Research Intern - Université du Québec à Rimouski
Professional Master's - Université de Montréal
Independent visiting researcher
Co-supervisor :
Independent visiting researcher - UQAR
PhD - Université de Montréal
Independent visiting researcher - MIT
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating researcher - Université Paris-Saclay
Principal supervisor :
PhD - Université de Montréal
PhD - Massachusetts Institute of Technology
PhD - Université de Montréal
PhD - Université de Montréal
Professional Master's - Université de Montréal
Research Intern - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Collaborating researcher - Université de Montréal
Collaborating researcher
Postdoctorate - Université de Montréal
Co-supervisor :
Independent visiting researcher - Technical University Munich (TUM)
PhD - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Co-supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Collaborating Alumni
Research Intern - Université de Montréal
Professional Master's - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Research Intern - McGill University
Research Intern - Imperial College London
PhD - Université de Montréal
Research Intern - Université de Montréal
Collaborating Alumni - Université de Montréal
DESS - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Professional Master's - Université de Montréal
Independent visiting researcher - Université de Montréal
Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)
Collaborating researcher - Ying Wu Coll of Computing
Professional Master's - Université de Montréal
PhD - Max-Planck-Institute for Intelligent Systems
Professional Master's - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - Université de Montréal
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Professional Master's - Université de Montréal
Independent visiting researcher - Technical University of Munich
PhD - École Polytechnique Montréal Fédérale de Lausanne
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - Valence
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Principal supervisor :
PhD - Université de Montréal
Professional Master's - Université de Montréal
Collaborating Alumni - Université de Montréal
Research Intern - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :

Publications

Improving Gradient-guided Nested Sampling for Posterior Inference
Pablo Lemos
Will Handley
Nikolay Malkin
We present a performant, general-purpose gradient-guided nested sampling algorithm, …
Unlearning via Sparse Representations
Vedant Shah
Frederik Träuble
Ashish Malik
Michael Curtis Mozer
Sanjeev Arora
Anirudh Goyal
Mitigating Biases with Diverse Ensembles and Diffusion Models
Luca Scimeca
Alexander Rubinstein
Damien Teney
Seong Joon Oh
Armand Nicolicioiu
Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as shortcut lea… (see more)rning, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs) to mitigate this form of bias. We show that at particular training intervals, DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features. We leverage this crucial property to generate synthetic counterfactuals to increase model diversity via ensemble disagreement. We show that DPM-guided diversification is sufficient to remove dependence on primary shortcut cues, without a need for additional supervised signals. We further empirically quantify its efficacy on several diversification objectives, and finally show improved generalization and diversification performance on par with prior work that relies on auxiliary data collection.
Learning from unexpected events in the neocortical microcircuit
Colleen J Gillon
Jason E. Pina
Jérôme A. Lecoq
Ruweida Ahmed
Yazan N. Billeh
Shiella Caldejon
Peter Groblewski
Timothy M. Henley
India Kato
Eric Lee
Jennifer Luviano
Kyla Mace
Chelsea Nayan
Thuyanh V. Nguyen
Kat North
Jed Perkins
Sam Seid
Matthew T. Valley
Ali Williford
Timothy P. Lillicrap
Joel Zylberberg
Responses to Pattern-Violating Visual Stimuli Evolve Differently Over Days in Somata and Distal Apical Dendrites
Colleen J Gillon
Jason E. Pina
Jérôme A. Lecoq
Ruweida Ahmed
Yazan N. Billeh
Shiella Caldejon
Peter Groblewski
Timothy M. Henley
India Kato
Eric Lee
Jennifer Luviano
Kyla Mace
Chelsea Nayan
Thuyanh V. Nguyen
Kat North
Jed Perkins
Sam Seid
Matthew T. Valley
Ali Williford
Timothy P. Lillicrap
Joel Zylberberg
Scientists have long conjectured that the neocortex learns patterns in sensory data to generate top-down predictions of upcoming stimuli. In… (see more) line with this conjecture, different responses to pattern-matching vs pattern-violating visual stimuli have been observed in both spiking and somatic calcium imaging data. However, it remains unknown whether these pattern-violation signals are different between the distal apical dendrites, which are heavily targeted by top-down signals, and the somata, where bottom-up information is primarily integrated. Furthermore, it is unknown how responses to pattern-violating stimuli evolve over time as an animal gains more experience with them. Here, we address these unanswered questions by analyzing responses of individual somata and dendritic branches of layer 2/3 and layer 5 pyramidal neurons tracked over multiple days in primary visual cortex of awake, behaving female and male mice. We use sequences of Gabor patches with patterns in their orientations to create pattern-matching and pattern-violating stimuli, and two-photon calcium imaging to record neuronal responses. Many neurons in both layers show large differences between their responses to pattern-matching and pattern-violating stimuli. Interestingly, these responses evolve in opposite directions in the somata and distal apical dendrites, with somata becoming less sensitive to pattern-violating stimuli and distal apical dendrites more sensitive. These differences between the somata and distal apical dendrites may be important for hierarchical computation of sensory predictions and learning, since these two compartments tend to receive bottom-up and top-down information, respectively.
A community effort in SARS-CoV-2 drug discovery.
Johannes Schimunek
Philipp Seidl
Katarina Elez
Tim Hempel
Tuan Le
Frank Noé
Simon Olsson
Lluís Raich
Robin Winter
Hatice Gokcan
Filipp Gusev
Evgeny M. Gutkin
Olexandr Isayev
Maria G. Kurnikova
Chamali H. Narangoda
Roman Zubatyuk
Ivan P. Bosko
Konstantin V. Furs
Anna D. Karpenko
Yury V. Kornoushenko … (see 133 more)
Mikita Shuldau
Artsemi Yushkevich
Mohammed B. Benabderrahmane
Patrick Bousquet‐Melou
Ronan Bureau
Beatrice Charton
Bertrand C. Cirou
Gérard Gil
William J. Allen
Suman Sirimulla
Stanley Watowich
Nick Antonopoulos
Nikolaos Epitropakis
Agamemnon Krasoulis
Vassilis Pitsikalis
Stavros Theodorakis
Igor Kozlovskii
Anton Maliutin
Alexander Medvedev
Petr Popov
Mark Zaretckii
Hamid Eghbal‐Zadeh
Christina Halmich
Sepp Hochreiter
Andreas Mayr
Peter Ruch
Michael Widrich
Francois Berenger
Ashutosh Kumar
Yoshihiro Yamanishi
Kam Y. J. Zhang
Emmanuel Bengio
Moksh J. Jain
Maksym Korablyov
Cheng-Hao Liu
Gilles Marcou
Marcous Gilles
Enrico Glaab
Kelly Barnsley
Suhasini M. Iyengar
Mary Jo Ondrechen
V. Joachim Haupt
Florian Kaiser
Michael Schroeder
Luisa Pugliese
Simone Albani
Christina Athanasiou
Andrea Beccari
Paolo Carloni
Giulia D'Arrigo
Eleonora Gianquinto
Jonas Goßen
Anton Hanke
Benjamin P. Joseph
Daria B. Kokh
Sandra Kovachka
Candida Manelfi
Goutam Mukherjee
Abraham Muñiz‐Chicharro
Francesco Musiani
Ariane Nunes‐Alves
Giulia Paiardi
Giulia Rossetti
S. Kashif Sadiq
Francesca Spyrakis
Carmine Talarico
Alexandros Tsengenes
Rebecca C. Wade
Conner Copeland
Jeremiah Gaiser
Daniel R. Olson
Amitava Roy
Vishwesh Venkatraman
Travis J. Wheeler
Haribabu Arthanari
Klara Blaschitz
Marco Cespugli
Vedat Durmaz
Konstantin Fackeldey
Patrick D. Fischer
Christoph Gorgulla
Christian Gruber
Karl Gruber
Michael Hetmann
Jamie E. Kinney
Krishna M. Padmanabha Das
Shreya Pandita
Amit Singh
Georg Steinkellner
Guilhem Tesseyre
Gerhard Wagner
Zi‐Fu Wang
Ryan J. Yust
Dmitry S. Druzhilovskiy
Dmitry A. Filimonov
Pavel V. Pogodin
Vladimir Poroikov
Anastassia V. Rudik
Leonid A. Stolbov
Alexander V. Veselovsky
Maria De Rosa
Giada De Simone
Maria R. Gulotta
Jessica Lombino
Nedra Mekni
Ugo Perricone
Arturo Casini
Amanda Embree
D. Benjamin Gordon
David Lei
Katelin Pratt
Christopher A. Voigt
Kuang‐Yu Chen
Yves Jacob
Tim Krischuns
Pierre Lafaye
Agnès Zettor
M. Luis Rodríguez
Kris M. White
Daren Fearon
Frank Von Delft
Martin A. Walsh
Dragos Horvath
Charles L. Brooks
Babak Falsafi
Bryan Ford
Adolfo García‐Sastre
Sang Yup Lee
Nadia Naffakh
Alexandre Varnek
Günter Klambauer
Thomas M. Hermans
The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availabili… (see more)ty of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against Covid-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.
SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data
Mélisande Teng
Amna Elmustafa
Benjamin Akera
Hager Radi
Biodiversity is declining at an unprecedented rate, impacting ecosystem services necessary to ensure food, water, and human health and well-… (see more)being. Understanding the distribution of species and their habitats is crucial for conservation policy planning. However, traditional methods in ecology for species distribution models (SDMs) generally focus either on narrow sets of species or narrow geographical areas and there remain significant knowledge gaps about the distribution of species. A major reason for this is the limited availability of data traditionally used, due to the prohibitive amount of effort and expertise required for traditional field monitoring. The wide availability of remote sensing data and the growing adoption of citizen science tools to collect species observations data at low cost offer an opportunity for improving biodiversity monitoring and enabling the modelling of complex ecosystems. We introduce a novel task for mapping bird species to their habitats by predicting species encounter rates from satellite images, and present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird, considering summer (breeding) and winter seasons. We also provide a dataset in Kenya representing low-data regimes. We additionally provide environmental data and species range maps for each location. We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks. SatBird opens up possibilities for scalably modelling properties of ecosystems worldwide.
Generative AI models should include detection mechanisms as a condition for public release
Alistair Knott
Dino Pedreschi
Raja Chatila
Tapabrata Chakraborti
Susan Leavy
Ricardo Baeza-Yates
D. Eyers
Andrew Trotman
Paul D. Teal
Przemyslaw Biecek
Stuart Russell
OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning
Rim Assouel
Pau Rodriguez
Perouz Taslakian
David Vazquez
Attention Schema in Neural Agents
Dianbo Liu
Samuele Bolotta
Mike He Zhu
Zahra Sheikhbahaee
Attention has become a common ingredient in deep learning architectures. It adds a dynamical selection of information on top of the static s… (see more)election of information supported by weights. In the same way, we can imagine a higher-order informational filter built on top of attention: an Attention Schema (AS), namely, a descriptive and predictive model of attention. In cognitive neuroscience, Attention Schema Theory (AST) supports this idea of distinguishing attention from AS. A strong prediction of this theory is that an agent can use its own AS to also infer the states of other agents' attention and consequently enhance coordination with other agents. As such, multi-agent reinforcement learning would be an ideal setting to experimentally test the validity of AST. We explore different ways in which attention and AS interact with each other. Our preliminary results indicate that agents that implement the AS as a recurrent internal control achieve the best performance. In general, these exploratory experiments suggest that equipping artificial agents with a model of attention can enhance their social intelligence.
Baking Symmetry into GFlowNets
George Ma
Emmanuel Bengio
Dinghuai Zhang
GFlowNets have exhibited promising performance in generating diverse candidates with high rewards. These networks generate objects increment… (see more)ally and aim to learn a policy that assigns probability of sampling objects in proportion to rewards. However, the current training pipelines of GFlowNets do not consider the presence of isomorphic actions, which are actions resulting in symmetric or isomorphic states. This lack of symmetry increases the amount of samples required for training GFlowNets and can result in inefficient and potentially incorrect flow functions. As a consequence, the reward and diversity of the generated objects decrease. In this study, our objective is to integrate symmetries into GFlowNets by identifying equivalent actions during the generation process. Experimental results using synthetic data demonstrate the promising performance of our proposed approaches.
Baking Symmetry into GFlowNets
George Ma
Emmanuel Bengio
Dinghuai Zhang
GFlowNets have exhibited promising performance in generating diverse candidates with high rewards. These networks generate objects increment… (see more)ally and aim to learn a policy that assigns probability of sampling objects in proportion to rewards. However, the current training pipelines of GFlowNets do not consider the presence of isomorphic actions, which are actions resulting in symmetric or isomorphic states. This lack of symmetry increases the amount of samples required for training GFlowNets and can result in inefficient and potentially incorrect flow functions. As a consequence, the reward and diversity of the generated objects decrease. In this study, our objective is to integrate symmetries into GFlowNets by identifying equivalent actions during the generation process. Experimental results using synthetic data demonstrate the promising performance of our proposed approaches.