Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Scientific Director, Leadership Team
Observer, Board of Directors, Mila

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Professional Master's - Université de Montréal
Co-supervisor :
Professional Master's - Université de Montréal
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Collaborating researcher - Université Paris-Saclay
Principal supervisor :
Professional Master's - Université de Montréal
Independent visiting researcher - MIT
PhD - École Polytechnique Montréal Fédérale de Lausanne
Research Intern - Université du Québec à Rimouski
Collaborating researcher
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
Professional Master's - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
Research Intern - Université de Montréal
Collaborating Alumni
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
Professional Master's - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
PhD - Massachusetts Institute of Technology
PhD - Université de Montréal
PhD - Université de Montréal
Independent visiting researcher - Technical University Munich (TUM)
Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)
DESS - Université de Montréal
Independent visiting researcher - UQAR
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Research Intern - Université de Montréal
Independent visiting researcher - Technical University of Munich
Research Intern - Imperial College London
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
PhD - McGill University
Principal supervisor :
Professional Master's - Université de Montréal
Collaborating researcher - Université de Montréal
Research Intern - Université de Montréal
Research Intern - Université de Montréal
PhD - Max-Planck-Institute for Intelligent Systems
PhD - McGill University
Principal supervisor :
Collaborating Alumni - Université de Montréal
Professional Master's - Université de Montréal
PhD - Université de Montréal
Independent visiting researcher - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating researcher
Professional Master's - Université de Montréal
Collaborating researcher - Valence
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
Independent visiting researcher
Co-supervisor :
Postdoctorate - Université de Montréal
Research Intern - McGill University
Professional Master's - Université de Montréal
Collaborating researcher
Principal supervisor :
Master's Research - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Principal supervisor :
Undergraduate - Université de Montréal
PhD - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Professional Master's - Université de Montréal
Postdoctorate - Université de Montréal

Publications

Regeneration Learning: A Learning Paradigm for Data Generation
Xu Tan
Tao Qin
Jiang Bian
Tie-Yan Liu
Benchmarking Graph Neural Networks
Vijay Prakash Dwivedi
Chaitanya K. Joshi
Thomas Laurent
Anh Tuan Luu
Xavier Bresson
Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Constant Memory Attentive Neural Processes
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
DynGFN: Bayesian Dynamic Causal Discovery using Generative Flow Networks
Lazar Atanackovic
Alexander Tong
Jason Hartford
Leo Jingyu Lee
Bo Wang
Learning the causal structure of observable variables is a central focus for scientific discovery. Bayesian causal discovery methods tackle… (see more) this problem by learning a posterior over the set of admissible graphs given our priors and observations. Existing methods primarily consider observations from static systems and assume the underlying causal structure takes the form of a directed acyclic graph (DAG). In settings with dynamic feedback mechanisms that regulate the trajectories of individual variables, this acyclicity assumption fails unless we account for time. We focus on learning Bayesian posteriors over cyclic graphs and treat causal discovery as a problem of sparse identification of a dynamical sys-tem. This imposes a natural temporal causal order between variables and captures cyclic feedback loops through time. Under this lens, we propose a new framework for Bayesian causal discovery for dynamical systems and present a novel generative flow network architecture (DynGFN) tailored for this task. Our results indicate that DynGFN learns posteriors that better encapsulate the distributions over admissible cyclic causal structures compared to counterpart state-of-the-art approaches.
GFlowNets for AI-Driven Scientific Discovery
Moksh J. Jain
Tristan Deleu
Jason Hartford
Cheng-Hao Liu
Alex Hernandez-Garcia
Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the p… (see more)ace of scientific discovery. While science has traditionally relied...
GFlowOut: Dropout with Generative Flow Networks
Dianbo Liu
Moksh J. Jain
Bonaventure F. P. Dossou
Qianli Shen
Salem Lahlou
Anirudh Goyal
Nikolay Malkin
Chris Emezue
Dinghuai Zhang
Nadhir Hassen
Xu Ji
Kenji Kawaguchi
GFlowOut: Dropout with Generative Flow Networks
Dianbo Liu
Moksh J. Jain
Bonaventure F. P. Dossou
Qianli Shen
Salem Lahlou
Anirudh Goyal
Nikolay Malkin
Chris Emezue
Dinghuai Zhang
Nadhir Hassen
Xu Ji
Kenji Kawaguchi
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
Eric Nguyen
Michael Poli
Marjan Faizi
Armin W Thomas
Callum Birch-Sykes
Michael Wornow
Aman Patel
Clayton M. Rabideau
Stefano Massaroli
Stefano Ermon
Stephen Baccus
Christopher Re
Learning GFlowNets from partial episodes for improved convergence and stability
Kanika Madan
Jarrid Rector-Brooks
Maksym Korablyov
Emmanuel Bengio
Moksh J. Jain
Andrei Cristian Nica
Tom Bosc
Nikolay Malkin
Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized … (see more)target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD(
MixupE: Understanding and improving Mixup from directional derivative perspective
Yingtian Zou
Vikas Verma
Sarthak Mittal
Wai Hoh Tang
Hieu Pham
Juho Kannala
Arno Solin
Kenji Kawaguchi
MixupE: Understanding and Improving Mixup from Directional Derivative Perspective
Vikas Verma
Yingtian Zou
Sarthak Mittal
Wai Hoh Tang
Hieu Pham
Juho Kannala
Arno Solin
Kenji Kawaguchi
Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpol… (see more)ating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.