Yoshua Bengio

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Jamal Abou Haibeh

Collaborating Alumni - McGill University

Berkes Anaïs

Collaborating researcher - Cambridge University

Principal supervisor :

Rim Assouel

PhD - Université de Montréal

Shahana Chatterjee

Collaborating researcher - N/A

Principal supervisor :

PhD - Université de Montréal

Sanghyeok Choi

Collaborating researcher - KAIST

PhD - Université de Montréal

Independent visiting researcher

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Jean-Pierre Falet

PhD - Université de Montréal

PhD

PhD - Université de Montréal

Moksh Jain

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - Université de Montréal

Hyeonah Kim

Postdoctorate - Université de Montréal

Principal supervisor :

Alex Hernández-García

Minsu Kim

Collaborating researcher - Université de Montréal

Postdoctorate - Université de Montréal

Principal supervisor :

Collaborating Alumni

Song LIU

Collaborating researcher - s.o.

Cristian Dragos Manta

PhD - Université de Montréal

Co-supervisor :

Dhanya Sridhar

Sarthak Mittal

PhD - Université de Montréal

Principal supervisor :

Independent visiting researcher - Université de Montréal

Padideh Nouri

PhD - Université de Montréal

Principal supervisor :

Ali Parviz

Collaborating researcher - Ying Wu Coll of Computing

Lena Podina

Collaborating researcher - University of Waterloo

Principal supervisor :

David Rolnick

Jarrid Rector-Brooks

PhD - Université de Montréal

Danyal REHMAN

Postdoctorate - Université de Montréal

Oli RICHARDSON

Postdoctorate - Université de Montréal

Camille Rochefort-Boulanger

PhD - Université de Montréal

Principal supervisor :

Julie Hussin

Divya Sharma

Postdoctorate

Co-supervisor :

Alex Hernández-García

Mélisande Astrid Crystal Teng

Collaborating Alumni - Université de Montréal

Co-supervisor :

Hugo Larochelle

Ivan Titov

Collaborating researcher

Principal supervisor :

Siva Reddy

Alex Tong

Collaborating Alumni - Université de Montréal

Collaborating Alumni - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Collaborating researcher

Collaborating researcher - Université de Montréal

Nicole Zhang

PhD - McGill University

Principal supervisor :

PhD - Université de Montréal

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

Collaborating Alumni - McGill University

Principal supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Scaling in the Service of Reasoning & Model-Based ML

April 4, 2023

Yoshua Bengio

Edward J. Hu

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

March 15, 2022

Generative Flow Networks

Yoshua Bengio

Publications

Language Models Can Reduce Asymmetry in Information Markets

Nasim Rahaman

Martin Weiss

Manuel Wuthrich

Erran L. Li

Christopher Pal

Bernhard Schölkopf

2024-03-20

ArXiv (preprint)

arxiv.org

Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport

Nikolay Malkin

Yanlei Zhang

Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized \textit{conditional flow matching} (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schrödinger bridge inference.

2024-03-10

TMLR (accepted)

Integrating Generative and Experimental Platforms or Biomolecular Design

Cheng-Hao Liu

Jarrid Rector-Brooks

Jason Yim

Soojung Yang

Sidney Lisanza

Francesca-Zhoufan Li

Pranam Chatterjee

Tommi Jaakkola

Regina Barzilay

David Baker

Frances H. Arnold

2024-03-07

ICLR.cc/2024/Workshop_Proposals (published)

Tackling Climate Change with Machine Learning: Fostering the Maturity of ML Applications for Climate Change

Shiva Madadkhani

Olivia Mendivil Ramos

Millie Chapman

Jesse Dunietz

Arthur Ouaknine

David Rolnick

2024-03-07

ICLR.cc/2024/Workshop_Proposals (published)

Efficient Causal Graph Discovery Using Large Language Models

Yash More

2024-03-04

AGI @ International Conference on Learning Representations (poster)

Towards DNA-Encoded Library Generation with GFlowNets

Michał Koziarski

Mohammed Abukalam

Vedant Shah

Louis Vaillancourt

Doris Alexandra Schuetz

Moksh Jain

Almer Van Der Sloot

Mathieu Bourgey

Anne Marinier

2024-03-03

GEM @ International Conference on Learning Representations (poster)

Distributional GFlowNets with Quantile Flows

Ricky T. Q. Chen

Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating com… (see more)plex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards.

2024-02-15

TMLR (accepted)

Computing Power and the Governance of Artificial Intelligence

Girish Sastry

Lennart Heim

Haydn Belfield

Markus Anderljung

Miles Brundage

Julian Hazell

Cullen C. O'keefe

Gillian K. Hadfield

Richard Ngo

Konstantin Pilz

George Gor

Emma Bluemke

Sarah Shoker

Janet Egan

Robert Trager

Shahar Avin

Adrian Weller

Diane Coyle

Computing power, or"compute,"is crucial for the development and deployment of artificial intelligence (AI) capabilities. As a result, govern… (see more)ments and companies have started to leverage compute as a means to govern AI. For example, governments are investing in domestic compute capacity, controlling the flow of compute to competing countries, and subsidizing compute access to certain sectors. However, these efforts only scratch the surface of how compute can be used to govern AI development and deployment. Relative to other key inputs to AI (data and algorithms), AI-relevant compute is a particularly effective point of intervention: it is detectable, excludable, and quantifiable, and is produced via an extremely concentrated supply chain. These characteristics, alongside the singular importance of compute for cutting-edge AI models, suggest that governing compute can contribute to achieving common policy objectives, such as ensuring the safety and beneficial use of AI. More precisely, policymakers could use compute to facilitate regulatory visibility of AI, allocate resources to promote beneficial outcomes, and enforce restrictions against irresponsible or malicious AI development and usage. However, while compute-based policies and technologies have the potential to assist in these areas, there is significant variation in their readiness for implementation. Some ideas are currently being piloted, while others are hindered by the need for fundamental research. Furthermore, naive or poorly scoped approaches to compute governance carry significant risks in areas like privacy, economic impacts, and centralization of power. We end by suggesting guardrails to minimize these risks from compute governance.

2024-02-12

ArXiv (preprint)

arxiv.org

Amortizing Intractable Inference in Large Language Models

Edward J. Hu

Nikolay Malkin

Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This l… (see more)imits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use.

2024-01-15

ICLR.cc/2024/Conference (oral)

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

Mingde Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstracti… (see more)ons to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies on the extraction of an abstracted proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end from hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper's significant advantage in zero-shot generalization, compared to some existing state-of-the-art hierarchical planning methods.

2024-01-15

International Conference on Learning Representations (poster)

Cycle Consistency Driven Object Discovery

Aniket Didolkar

Anirudh Goyal

Developing deep learning models that effectively learn object-centric representations, akin to human cognition, remains a challenging task. … (see more)Existing approaches facilitate object discovery by representing objects as fixed-size vectors, called ``slots'' or ``object files''. While these approaches have shown promise in certain scenarios, they still exhibit certain limitations. First, they rely on architectural priors which can be unreliable and usually require meticulous engineering to identify the correct objects. Second, there has been a notable gap in investigating the practical utility of these representations in downstream tasks. To address the first limitation, we introduce a method that explicitly optimizes the constraint that each object in a scene should be associated with a distinct slot. We formalize this constraint by introducing consistency objectives which are cyclic in nature. By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance. These enhancements consistently hold true across both synthetic and real-world scenes, underscoring the effectiveness and adaptability of the proposed approach. To tackle the second limitation, we apply the learned object-centric representations from the proposed method to two downstream reinforcement learning tasks, demonstrating considerable performance enhancements compared to conventional slot-based and monolithic representation learning methods. Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.

2024-01-15

ICLR.cc/2024/Conference (poster)