Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Founder and Scientific Advisor, Leadership Team
Research Topics
Causality
Computational Neuroscience
Deep Learning
Generative Models
Graph Neural Networks
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Natural Language Processing
Probabilistic Models
Reasoning
Recurrent Neural Networks
Reinforcement Learning
Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Marie-Josée Beauchamp, Administrative Assistant at marie-josee.beauchamp@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Collaborating Alumni - McGill University
Collaborating Alumni - Université de Montréal
Collaborating researcher - Cambridge University
Principal supervisor :
PhD - Université de Montréal
Independent visiting researcher
Co-supervisor :
PhD - Université de Montréal
Collaborating researcher - N/A
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher - KAIST
PhD - Université de Montréal
PhD - Université de Montréal
Research Intern - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating Alumni
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Ying Wu Coll of Computing
PhD - University of Waterloo
Principal supervisor :
Collaborating Alumni - Max-Planck-Institute for Intelligent Systems
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Collaborating Alumni - Université de Montréal
Master's Research - Université de Montréal
Postdoctorate
Independent visiting researcher - Technical University of Munich
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Université de Montréal
Collaborating researcher
Collaborating researcher - KAIST
PhD - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :

Publications

Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
Jade Leung
Daniel Kokotajlo
Nahema A. Marchal
Markus Anderljung
Noam Kolt
Lewis Ho
Divya Siddarth
Shahar Avin
W. Hawkins
Been Kim
Iason Gabriel
Vijay Bolina
Jack Clark
Paul F. Christiano … (see 1 more)
Allan Dafoe
Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
Jade Leung
Daniel Kokotajlo
Nahema A. Marchal
Markus Anderljung
Noam Kolt
Lewis Ho
Divya Siddarth
Shahar Avin
W. Hawkins
Been Kim
Iason Gabriel
Vijay Bolina
Jack Clark
Paul F. Christiano … (see 1 more)
Allan Dafoe
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further pro… (see more)gress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through"dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through"alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
Jade Leung
Daniel Kokotajlo
Nahema A. Marchal
Markus Anderljung
Noam Kolt
Lewis Ho
Divya Siddarth
Shahar Avin
W. Hawkins
Been Kim
Iason Gabriel
Vijay Bolina
Jack Clark
Paul F. Christiano … (see 1 more)
Allan Dafoe
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further pro… (see more)gress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through"dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through"alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
Responses of pyramidal cell somata and apical dendrites in mouse visual cortex over multiple days
Colleen J Gillon
Jérôme A. Lecoq
Jason E. Pina
Ruweida Ahmed
Yazan N. Billeh
Shiella Caldejon
Peter Groblewski
Timothy M. Henley
India Kato
Eric Lee
Jennifer Luviano
Kyla Mace
Chelsea Nayan
Thuyanh V. Nguyen
Kat North
Jed Perkins
Sam Seid
Matthew T. Valley
Ali Williford
Timothy P. Lillicrap
Joel Zylberberg
Responses of pyramidal cell somata and apical dendrites in mouse visual cortex over multiple days
Colleen J Gillon
Jérôme A. Lecoq
Jason E. Pina
Ruweida Ahmed
Yazan N. Billeh
Shiella Caldejon
Peter Groblewski
Timothy M. Henley
India Kato
Eric Lee
Jennifer Luviano
Kyla Mace
Chelsea Nayan
Thuyanh V. Nguyen
Kat North
Jed Perkins
Sam Seid
Matthew T. Valley
Ali Williford
Timothy P. Lillicrap
Joel Zylberberg
Automated Detection of Anatomical Landmarks During Colonoscopy Using a Deep Learning Model
Mahsa Taghiakbari
Sina Hamidi Ghalehjegh
Emmanuel Jehanno
Tess Berthier
Lisa Di Jorio
Saber Ghadakzadeh
Alan Barkun
Mark Takla
Mickael Bouin
Eric Deslandres
Simon Bouchard
Sacha Sidani
Daniel von Renteln
Abstract Background and aims Identification and photo-documentation of the ileocecal valve (ICV) and appendiceal orifice (AO) confirm comple… (see more)teness of colonoscopy examinations. We aimed to develop and test a deep convolutional neural network (DCNN) model that can automatically identify ICV and AO, and differentiate these landmarks from normal mucosa and colorectal polyps. Methods We prospectively collected annotated full-length colonoscopy videos of 318 patients undergoing outpatient colonoscopies. We created three nonoverlapping training, validation, and test data sets with 25,444 unaltered frames extracted from the colonoscopy videos showing four landmarks/image classes (AO, ICV, normal mucosa, and polyps). A DCNN classification model was developed, validated, and tested in separate data sets of images containing the four different landmarks. Results After training and validation, the DCNN model could identify both AO and ICV in 18 out of 21 patients (85.7%). The accuracy of the model for differentiating AO from normal mucosa, and ICV from normal mucosa were 86.4% (95% CI 84.1% to 88.5%), and 86.4% (95% CI 84.1% to 88.6%), respectively. Furthermore, the accuracy of the model for differentiating polyps from normal mucosa was 88.6% (95% CI 86.6% to 90.3%). Conclusion This model offers a novel tool to assist endoscopists with automated identification of AO and ICV during colonoscopy. The model can reliably distinguish these anatomical landmarks from normal mucosa and colorectal polyps. It can be implemented into automated colonoscopy report generation, photo-documentation, and quality auditing solutions to improve colonoscopy reporting quality.
Automated Detection of Anatomical Landmarks During Colonoscopy Using a Deep Learning Model
Mahsa Taghiakbari
Sina Hamidi Ghalehjegh
Emmanuel Jehanno
Tess Berthier
Lisa Di Jorio
Saber Ghadakzadeh
Alan Barkun
Mark Takla
Mickael Bouin
Eric Deslandres
Simon Bouchard
Sacha Sidani
Daniel von Renteln
Abstract Background and aims Identification and photo-documentation of the ileocecal valve (ICV) and appendiceal orifice (AO) confirm comple… (see more)teness of colonoscopy examinations. We aimed to develop and test a deep convolutional neural network (DCNN) model that can automatically identify ICV and AO, and differentiate these landmarks from normal mucosa and colorectal polyps. Methods We prospectively collected annotated full-length colonoscopy videos of 318 patients undergoing outpatient colonoscopies. We created three nonoverlapping training, validation, and test data sets with 25,444 unaltered frames extracted from the colonoscopy videos showing four landmarks/image classes (AO, ICV, normal mucosa, and polyps). A DCNN classification model was developed, validated, and tested in separate data sets of images containing the four different landmarks. Results After training and validation, the DCNN model could identify both AO and ICV in 18 out of 21 patients (85.7%). The accuracy of the model for differentiating AO from normal mucosa, and ICV from normal mucosa were 86.4% (95% CI 84.1% to 88.5%), and 86.4% (95% CI 84.1% to 88.6%), respectively. Furthermore, the accuracy of the model for differentiating polyps from normal mucosa was 88.6% (95% CI 86.6% to 90.3%). Conclusion This model offers a novel tool to assist endoscopists with automated identification of AO and ICV during colonoscopy. The model can reliably distinguish these anatomical landmarks from normal mucosa and colorectal polyps. It can be implemented into automated colonoscopy report generation, photo-documentation, and quality auditing solutions to improve colonoscopy reporting quality.
Combining Parameter-efficient Modules for Task-level Generalisation
FAENet: Frame Averaging Equivariant GNN for Materials Modeling
Alexandre AGM Duval
Victor Schmidt
Alex Hernandez-Garcia
Santiago Miret
Fragkiskos D. Malliaros
Applications of machine learning techniques for materials modeling typically involve functions known to be equivariant or invariant to speci… (see more)fic symmetries. While graph neural networks (GNNs) have proven successful in such tasks, they enforce symmetries via the model architecture, which often reduces their expressivity, scalability and comprehensibility. In this paper, we introduce (1) a flexible framework relying on stochastic frame-averaging (SFA) to make any model E(3)-equivariant or invariant through data transformations. (2) FAENet: a simple, fast and expressive GNN, optimized for SFA, that processes geometric information without any symmetrypreserving design constraints. We prove the validity of our method theoretically and empirically demonstrate its superior accuracy and computational scalability in materials modeling on the OC20 dataset (S2EF, IS2RE) as well as common molecular modeling tasks (QM9, QM7-X). A package implementation is available at https://faenet.readthedocs.io.
FAENet: Frame Averaging Equivariant GNN for Materials Modeling
Alexandre AGM Duval
Victor Schmidt
Alex Hernandez-Garcia
Santiago Miret
Fragkiskos D. Malliaros
Applications of machine learning techniques for materials modeling typically involve functions known to be equivariant or invariant to speci… (see more)fic symmetries. While graph neural networks (GNNs) have proven successful in such tasks, they enforce symmetries via the model architecture, which often reduces their expressivity, scalability and comprehensibility. In this paper, we introduce (1) a flexible framework relying on stochastic frame-averaging (SFA) to make any model E(3)-equivariant or invariant through data transformations. (2) FAENet: a simple, fast and expressive GNN, optimized for SFA, that processes geometric information without any symmetrypreserving design constraints. We prove the validity of our method theoretically and empirically demonstrate its superior accuracy and computational scalability in materials modeling on the OC20 dataset (S2EF, IS2RE) as well as common molecular modeling tasks (QM9, QM7-X). A package implementation is available at https://faenet.readthedocs.io.
FAENet: Frame Averaging Equivariant GNN for Materials Modeling
Alexandre AGM Duval
Victor Schmidt
Alex Hernandez-Garcia
Santiago Miret
Fragkiskos D. Malliaros
Applications of machine learning techniques for materials modeling typically involve functions known to be equivariant or invariant to speci… (see more)fic symmetries. While graph neural networks (GNNs) have proven successful in such tasks, they enforce symmetries via the model architecture, which often reduces their expressivity, scalability and comprehensibility. In this paper, we introduce (1) a flexible framework relying on stochastic frame-averaging (SFA) to make any model E(3)-equivariant or invariant through data transformations. (2) FAENet: a simple, fast and expressive GNN, optimized for SFA, that processes geometric information without any symmetrypreserving design constraints. We prove the validity of our method theoretically and empirically demonstrate its superior accuracy and computational scalability in materials modeling on the OC20 dataset (S2EF, IS2RE) as well as common molecular modeling tasks (QM9, QM7-X). A package implementation is available at https://faenet.readthedocs.io.
Better Training of GFlowNets with Local Credit and Incomplete Trajectories
Ling Pan
Nikolay Malkin
Dinghuai Zhang