Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Founder and Scientific Advisor, Leadership Team
Research Topics
Causality
Computational Neuroscience
Deep Learning
Generative Models
Graph Neural Networks
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Natural Language Processing
Probabilistic Models
Reasoning
Recurrent Neural Networks
Reinforcement Learning
Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Collaborating Alumni - McGill University
Collaborating researcher - Cambridge University
Principal supervisor :
PhD - Université de Montréal
Independent visiting researcher
Co-supervisor :
Collaborating researcher - N/A
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher - KAIST
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
Co-supervisor :
Independent visiting researcher
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating Alumni
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Ying Wu Coll of Computing
Collaborating researcher - University of Waterloo
Principal supervisor :
Collaborating Alumni - Max-Planck-Institute for Intelligent Systems
Collaborating researcher - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate
Co-supervisor :
Collaborating Alumni - Polytechnique Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher
Principal supervisor :
Collaborating Alumni - Université de Montréal
Collaborating Alumni - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - McGill University
Principal supervisor :

Publications

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
Nan Rosemary Ke
Siyuan Guo
Michal Valko
Timothy Lillicrap
Danilo Rezende
Michael Mozer
Sanjeev Arora
Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly posse… (see more)ss some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.
RGFN: Synthesizable Molecular Generation Using GFlowNets
Andrei Rekesh
Dmytro Shevchuk
Cheng-Hao Liu
Mike Tyers
Robert A. Batey
Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional… (see more) in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
Al content detection in the emerging information ecosystem: new obligations for media and tech companies
Alistair Knott
Dino Pedreschi
Toshiya Jitsuzumi
Susan Leavy
David Eyers
Tapabrata Chakraborti
Andrew Trotman
Sundar Sundareswaran
Ricardo Baeza-Yates
Przemyslaw Biecek
Adrian Weller
Paul D. Teal
Subhadip Basu
Mehmet Haklidir
Virginia Morini
Stuart Russell
A high-throughput phenotypic screen combined with an ultra-large-scale deep learning-based virtual screening reveals novel scaffolds of antibacterial compounds
Gabriele Scalia
Steven T. Rutherford
Ziqing Lu
Kerry R. Buchholz
Nicholas Skelton
Kangway Chuang
Nathaniel Diamant
Jan-Christian Hütter
Jerome-Maxim Luescher
Anh Miu
Jeff Blaney
Leo Gendelev
Elizabeth Skippington
Greg Zynda
Nia Dickson
Aviv Regev
Man-Wah Tan
Tommaso Biancalani
The proliferation of multi-drug-resistant bacteria underscores an urgent need for novel antibiotics. Traditional discovery methods face chal… (see more)lenges due to limited chemical diversity, high costs, and difficulties in identifying structurally novel compounds. Here, we explore the integration of small molecule high-throughput screening with a deep learning-based virtual screening approach to uncover new antibacterial compounds. Leveraging a diverse library of nearly 2 million small molecules, we conducted comprehensive phenotypic screening against a sensitized Escherichia coli strain that, at a low hit rate, yielded thousands of hits. We trained a deep learning model, GNEprop, to predict antibacterial activity, ensuring robustness through out-of-distribution generalization techniques. Virtual screening of over 1.4 billion compounds identified potential candidates, of which 82 exhibited antibacterial activity, illustrating a 90X improved hit rate over the high-throughput screening experiment GNEprop was trained on. Importantly, a significant portion of these newly identified compounds exhibited high dissimilarity to known antibiotics, indicating promising avenues for further exploration in antibiotic discovery.
Zero-Shot Object-Centric Representation Learning
Aniket Rajiv Didolkar
Andrii Zadaianchuk
Michael Curtis Mozer
Georg Martius
Maximilian Seitzer
The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (see more). Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.
Multi-Fidelity Active Learning with GFlowNets
In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanw… (see more)hile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive. Progress in machine learning methods that can efficiently tackle such problems would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose the use of GFlowNets for multi-fidelity active learning, where multiple approximations of the black-box function are available at lower fidelity and cost. GFlowNets are recently proposed methods for amortised probabilistic inference that have proven efficient for exploring large, high-dimensional spaces and can hence be practical in the multi-fidelity setting too. Here, we describe our algorithm for multi-fidelity active learning with GFlowNets and evaluate its performance in both well-studied synthetic tasks and practically relevant applications of molecular discovery. Our results show that multi-fidelity active learning with GFlowNets can efficiently leverage the availability of multiple oracles with different costs and fidelities to accelerate scientific discovery and engineering design.
Redesigning Information Markets in the Era of Language Models
Nasim Rahaman
Manuel Wuthrich
Li Erran Li
Bernhard Schölkopf
Christopher Pal
Improving Gradient-Guided Nested Sampling for Posterior Inference
We present a performant, general-purpose gradient-guided nested sampling algorithm, …
Memory Efficient Neural Processes via Constant Memory Attention Block
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
On Generalization for Generative Flow Networks
Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an u… (see more)nnormalized probability distribution, called the reward function. This framework learns a policy on a constructed graph, which enables sampling from an approximation of the target probability distribution through successive steps of sampling from the learned policy. To achieve this, GFlowNets can be trained with various objectives, each of which can lead to the model s ultimate goal. The aspirational strength of GFlowNets lies in their potential to discern intricate patterns within the reward function and their capacity to generalize effectively to novel, unseen parts of the reward function. This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function. The experiments will focus on length generalization meaning generalization to states that can be constructed only by longer trajectories than those seen in training.
Cell Morphology-Guided Small Molecule Generation with GFlowNets
Stephen Zhewen Lu
Ziqing Lu
Ehsan Hajiramezanali
Tommaso Biancalani
Gabriele Scalia
Estimating Expectations without Sampling: Neural Stein Estimation
Cheikh Ahmed
Awa Khouna
We propose a method for estimating the expected value of a given function …