Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Founder and Scientific Advisor, Leadership Team
Research Topics
Causality
Computational Neuroscience
Deep Learning
Generative Models
Graph Neural Networks
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Natural Language Processing
Probabilistic Models
Reasoning
Recurrent Neural Networks
Reinforcement Learning
Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Marie-Josée Beauchamp, Administrative Assistant at marie-josee.beauchamp@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Collaborating Alumni - McGill University
Collaborating Alumni - Université de Montréal
Collaborating researcher - Cambridge University
Principal supervisor :
PhD - Université de Montréal
Independent visiting researcher
Co-supervisor :
PhD - Université de Montréal
Collaborating researcher - N/A
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher - KAIST
PhD - Université de Montréal
PhD - Université de Montréal
Research Intern - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Principal supervisor :
Collaborating Alumni
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Ying Wu Coll of Computing
PhD - University of Waterloo
Principal supervisor :
Collaborating Alumni - Max-Planck-Institute for Intelligent Systems
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - Université de Montréal
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Collaborating Alumni - Université de Montréal
Master's Research - Université de Montréal
Postdoctorate
Independent visiting researcher - Technical University of Munich
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Université de Montréal
Collaborating researcher
Collaborating researcher - KAIST
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - McGill University
Principal supervisor :

Publications

Extending the Framework of Equilibrium Propagation to General Dynamics
Benjamin Scellier
Anirudh Goyal
Jonathan Binas
Thomas Mesnard
A Deep Reinforcement Learning Chatbot (Short Version)
Iulian V. Serban
Chinnadhurai Sankar
Mathieu Germain
Saizheng Zhang
Zhouhan Lin
Sandeep Subramanian
Taesup Kim
Michael Pieper
Nan Rosemary Ke
Sai Rajeswar
Alexandre De Brébisson
Jose Sotelo
Dendi Suhubdy
Vincent Michalski
Alexandre Nguyen
We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon … (see more)Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents.
A3T: Adversarially Augmented Adversarial Training
Akram Erraqabi
Aristide Baratin
Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations o… (see more)f the input data purposely designed to fool a machine learning classifier. Most classification models, including deep learning models, are highly vulnerable to adversarial attacks. In this work, we investigate a procedure to improve adversarial robustness of deep neural networks through enforcing representation invariance. The idea is to train the classifier jointly with a discriminator attached to one of its hidden layer and trained to filter the adversarial noise. We perform preliminary experiments to test the viability of the approach and to compare it to other standard adversarial training methods.
Bayesian Model-Agnostic Meta-Learning
Taesup Kim
Jaesik Yoon
Ousmane Dia
Sungwoong Kim
Sungjin Ahn
Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty … (see more)inherent in the problem. In this paper, we propose a novel Bayesian model-agnostic meta-learning method. The proposed method combines scalable gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. During fast adaptation, the method is capable of learning complex uncertainty structure beyond a point estimate or a simple Gaussian approximation. In addition, a robust Bayesian meta-update mechanism with a new meta-loss prevents overfitting during meta-update. Remaining an efficient gradient-based meta-learner, the method is also model-agnostic and simple to implement. Experiment results show the accuracy and robustness of the proposed method in various tasks: sinusoidal regression, image classification, active learning, and reinforcement learning.
BigBrain: 1D convolutional neural networks for automated sementation of cortical layers
Konrad Wagstyl
Claude Lepage
Karl Zilles
Sebastian Bludau
G. Cucurul
Alan C. Evans
Paul C Fletcher
Joseph Paul Cohen
Stéphanie Larocque
Thomas Funck
Katrin Amunts
Boundary Seeking GANs
Athul Jacob
Adam Trischler
Gerry Che
Kyunghyun Cho
Generative adversarial networks are a learning framework that rely on training a discriminator to estimate a measure of difference between a… (see more) target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning.
ChatPainter: Improving Text to Image Generation using Dialogue
Shikhar Sharma
Dendi Suhubdy
Vincent Michalski
Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can c… (see more)ontain several objects, is a challenging task. Prior work has used text captions to generate images. However, captions might not be informative enough to capture the entire image and insufficient for the model to be able to understand which objects in the images correspond to which words in the captions. We show that adding a dialogue that further describes the scene leads to significant improvement in the inception score and in the quality of generated images on the MS COCO dataset.
Convolutional neural networks for mesh-based parcellation of the cerebral cortex
Guillem Cucurull
Konrad Wagstyl
Arantxa Casanova
Petar Veličković
Estrid Jakobsen
Michal Drozdzal
Alan C. Evans
In order to understand the organization of the cerebral cortex, it is necessary to create a map or parcellation of cortical areas. Reconstru… (see more)ctions of the cortical surface created from structural MRI scans, are frequently used in neuroimaging as a common coordinate space for representing multimodal neuroimaging data. These meshes are used to investigate healthy brain organization as well as abnormalities in neurological and psychiatric conditions. We frame cerebral cortex parcellation as a mesh segmentation task, and address it by taking advantage of recent advances in generalizing convolutions to the graph domain. In particular, we propose to assess graph convolutional networks and graph attention networks, which, in contrast to previous mesh parcellation models, exploit the underlying structure of the data to make predictions. We show experimentally on the Human Connectome Project dataset that the proposed graph convolutional models outperform current state-of-the-art and baselines, highlighting the potential and applicability of these methods to tackle neuroimaging challenges, paving the road towards a better characterization of brain diseases.
A Dataset of Topic-Oriented Human-to-Chatbot Dialogues
Varvara Logacheva
Mikhail Burtsev
Valentin Malykh
Vadim Poluliakh
Alexander Rudnicky
Iulian V. Serban
Ryan Thomas Lowe
Shrimai Prabhumoye
Alan W. Black
This document contains the description of dataset collected during the first round of Conversational Intelligence Challenge (ConvAI) which t… (see more)ook place in July 2017. During this evaluation round we collected over 2,500 dialogues from 10 chatbots and 500 volunteers. Here we provide the analysis of dataset statistics and outline some possible improvements for future data collection experiments.
Deep Complex Networks
Chiheb Trabelsi
Olexa Bilaniuk
Ying Zhang
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and re… (see more)presentations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech spectrum prediction using TIMIT. We achieve state-of-the-art performance on these audio-related tasks.
Dendritic cortical microcircuits approximate the backpropagation algorithm
João Sacramento
Rui Ponte Costa
Walter Senn
Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience. However, the main learning mechan… (see more)ism behind these advances - error backpropagation - appears to be at odds with neurobiology. Here, we introduce a multilayer neuronal network model with simplified dendritic compartments in which error-driven synaptic plasticity adapts the network towards a global desired output. In contrast to previous work our model does not require separate phases and synaptic learning is driven by local dendritic prediction errors continuously in time. Such errors originate at apical dendrites and occur due to a mismatch between predictive input from lateral interneurons and activity from actual top-down feedback. Through the use of simple dendritic compartments and different cell-types our model can represent both error and normal activity within a pyramidal neuron. We demonstrate the learning capabilities of the model in regression and classification tasks, and show analytically that it approximates the error backpropagation algorithm. Moreover, our framework is consistent with recent observations of learning between brain areas and the architecture of cortical microcircuits. Overall, we introduce a novel view of learning on dendritic cortical circuits and on how the brain may solve the long-standing synaptic credit assignment problem.
FigureQA: An Annotated Figure Dataset for Visual Reasoning
Adam Atkinson
Vincent Michalski
Ákos Kádár
Adam Trischler
We introduce FigureQA, a visual reasoning corpus of over one million question-answer pairs grounded in over 100,000 images. The images are s… (see more)ynthetic, scientific-style figures from five classes: line plots, dot-line plots, vertical and horizontal bar graphs, and pie charts. We formulate our reasoning task by generating questions from 15 templates; questions concern various relationships between plot elements and examine characteristics like the maximum, the minimum, area-under-the-curve, smoothness, and intersection. To resolve, such questions often require reference to multiple plot elements and synthesis of information distributed spatially throughout a figure. To facilitate the training of machine learning systems, the corpus also includes side data that can be used to formulate auxiliary objectives. In particular, we provide the numerical data used to generate each figure as well as bounding-box annotations for all plot elements. We study the proposed visual reasoning task by training several models, including the recently proposed Relation Network as a strong baseline. Preliminary results indicate that the task poses a significant machine learning challenge. We envision FigureQA as a first step towards developing models that can intuitively recognize patterns from visual representations of data.