Yoshua Bengio

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Marie-Josée Beauchamp, adjointe administrative à marie-josee.beauchamp@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Collaborateur·rice alumni - McGill

Mohammed Abukalam

Collaborateur·rice alumni - UdeM

Berkes Anaïs

Collaborateur·rice de recherche - Cambridge University

Superviseur⋅e principal⋅e :

Rim Assouel

Doctorat - UdeM

Stefan Bauer

Visiteur de recherche indépendant

Co-superviseur⋅e :

Guillaume Lajoie

Paul Bertin

Doctorat - UdeM

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche - KAIST

Doctorat - UdeM

Doctorat - UdeM

Stagiaire de recherche - UdeM

Co-superviseur⋅e :

Loubna Benabbou

Eric Elmoznino

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Leo Feng

Doctorat - UdeM

leo.feng@mila.quebec

Ivan Grega

Stagiaire de recherche - UdeM

Doctorat

Doctorat - UdeM

mohsin.hasan@mila.quebec

Edward Hu

Doctorat - UdeM

Moksh Jain

Doctorat - UdeM

moksh.jain@mila.quebec

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Hyeonah Kim

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Alex Hernandez

Minsu Kim

Stagiaire de recherche - UdeM

Collaborateur·rice de recherche - UdeM

Salem Lahlou

Collaborateur·rice alumni - UdeM

Seanie Lee

Collaborateur·rice alumni - UdeM

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Zhen Liu

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Liam Paull

Chenghao Liu

Collaborateur·rice alumni

Doctorat - UdeM

Collaborateur·rice alumni - UdeM

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

Sören Mindermann

Collaborateur·rice de recherche - UdeM

Sarthak Mittal

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant - UdeM

Padideh Nouri

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Ali Parviz

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Camille Rochefort-Boulanger

Lena Podina

Doctorat - University of Waterloo

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - Max-Planck-Institute for Intelligent Systems

Doctorat - UdeM

Postdoctorat - UdeM

Visiteur de recherche indépendant - UdeM

Postdoctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Julie Hussin

Victor Schmidt

Collaborateur·rice alumni - UdeM

Postdoctorat - UdeM

Maîtrise recherche - UdeM

Marcin Sendera

Collaborateur·rice alumni - UdeM

Vedant Shah

Maîtrise recherche - UdeM

Postdoctorat

Marco Stock

Visiteur de recherche indépendant - Technical University of Munich

marco.stock@tum.de

Mélisande Astrid Crystal Teng

Doctorat - UdeM

Co-superviseur⋅e :

Hugo Larochelle

alexander.tong@mila.quebec

Alex Tong

Postdoctorat - UdeM

Postdoctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Omar G. Younis

Collaborateur·rice de recherche

Collaborateur·rice de recherche - KAIST

Doctorat - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Aaron Courville

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Harry Zhao

Doctorat - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

Scaling in the service of reasoning & model-based ML

4 avril 2023

par

Yoshua Bengio

Edward J. Hu

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

par

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

Les réseaux de flot génératifs

15 mars 2022

par

Yoshua Bengio

Publications

Representation Mixing for TTS Synthesis

Kyle Kastner

Joao Felipe Santos

Aaron Courville

Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation. Ho… (voir plus)wever, the choice between character or phoneme input can create serious limitations for practical deployment, as direct control of pronunciation is crucial in certain cases. We demonstrate a simple method for combining multiple types of linguistic information in a single encoder, named representation mixing, enabling flexible choice between character, phoneme, or mixed representations during inference. Experiments and user studies on a public audiobook corpus show the efficacy of our approach.

2019-05-12

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)

The Pytorch-kaldi Speech Recognition Toolkit

Mirco Ravanelli

Titouan Parcollet

The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, … (voir plus)for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility.The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters.Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.

2019-05-12

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)

Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks

Victor Schmidt

Alexandra Luccioni

S. Karthik Mukkavilli

Narmada Balasooriya

Kris Sankaran

Jennifer T Chayes

We present a project that aims to generate images that depict accurate, vivid, and personalized outcomes of climate change using Cycle-Consi… (voir plus)stent Adversarial Networks (CycleGANs). By training our CycleGAN model on street-view images of houses before and after extreme weather events (e.g. floods, forest fires, etc.), we learn a mapping that can then be applied to images of locations that have not yet experienced these events. This visual transformation is paired with climate model predictions to assess likelihood and type of climate-related events in the long term (50 years) in order to bring the future closer in the viewers mind. The eventual goal of our project is to enable individuals to make more informed choices about their climate future by creating a more visceral understanding of the effects of climate change, while maintaining scientific credibility by drawing on climate model projections.

2019-05-02

ArXiv (prépublication)

Compositional generalization in a deep seq2seq model by separating syntax and semantics

Jacob Russin

Jason Jo

R. O’Reilly

Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows … (voir plus)for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntactic and semantic processing, we implement a modification to standard approaches in neural machine translation, imposing an analogous separation. The novel model, which we call Syntactic Attention, substantially outperforms standard methods in deep learning on the SCAN dataset, a compositional generalization task, without any hand-engineered features or additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure.

2019-04-22

ArXiv (prépublication)

GradMask: Reduce Overfitting by Regularizing Saliency

Becks Simpson

Francis Dutil

Joseph Paul Cohen

With too few samples or too many model parameters, overfitting can inhibit the ability to generalise predictions to new data. Within medical… (voir plus) imaging, this can occur when features are incorrectly assigned importance such as distinct hospital specific artifacts, leading to poor performance on a new dataset from a different institution without those features, which is undesirable. Most regularization methods do not explicitly penalize the incorrect association of these features to the target class and hence fail to address this issue. We propose a regularization method, GradMask, which penalizes saliency maps inferred from the classifier gradients when they are not consistent with the lesion segmentation. This prevents non-tumor related features to contribute to the classification of unhealthy samples. We demonstrate that this method can improve test accuracy between 1-3% compared to the baseline without GradMask, showing that it has an impact on reducing overfitting.

2019-04-15

MIDL.io/2019/Conference/Abstract (inconnu)

openreview.net

Reinforced Imitation in Heterogeneous Action Space

Konrad Żołna

Negar Rostamzadeh

Sungjin Ahn

Pedro O. Pinheiro

Imitation learning is an effective alternative approach to learn a policy when the reward function is sparse. In this paper, we consider a c… (voir plus)hallenging setting where an agent and an expert use different actions from each other. We assume that the agent has access to a sparse reward function and state-only expert observations. We propose a method which gradually balances between the imitation learning cost and the reinforcement learning objective. In addition, this method adapts the agent's policy based on either mimicking expert behavior or maximizing sparse reward. We show, through navigation scenarios, that (i) an agent is able to efficiently leverage sparse rewards to outperform standard state-only imitation learning, (ii) it can learn a policy even when its actions are different from the expert, and (iii) the performance of the agent is not bounded by that of the expert, due to the optimized usage of sparse rewards.

2019-04-06

ArXiv (prépublication)

Gated Orthogonal Recurrent Units: On Learning to Forget

Li Jing

Caglar Gulcehre

John Peurifoy

Yichen Shen

Max Tegmark

Marin Soljacic

We present a novel recurrent neural network (RNN)–based model that combines the remembering ability of unitary evolution RNNs with the abi… (voir plus)lity of gated RNNs to effectively forget redundant or irrelevant information in its memory. We achieve this by extending restricted orthogonal evolution RNNs with a gating mechanism similar to gated recurrent unit RNNs with a reset gate and an update gate. Our model is able to outperform long short-term memory, gated recurrent units, and vanilla unitary or orthogonal RNNs on several long-term-dependency benchmark tasks. We empirically show that both orthogonal and unitary RNNs lack the ability to forget. This ability plays an important role in RNNs. We provide competitive results along with an analysis of our model on many natural sequential tasks, including question answering, speech spectrum prediction, character-level language modeling, and synthetic tasks that involve long-term dependencies such as algorithmic, denoising, and copying tasks.

2019-04-01

Neural Computation (publié)

Towards Standardization of Data Licenses: The Montreal Data License

Misha Benjamin

P. Gagnon

Negar Rostamzadeh

Chris Pal

Alex Shee

This paper provides a taxonomy for the licensing of data in the fields of artificial intelligence and machine learning. The paper's goal is … (voir plus)to build towards a common framework for data licensing akin to the licensing of open source software. Increased transparency and resolving conceptual ambiguities in existing licensing language are two noted benefits of the approach proposed in the paper. In parallel, such benefits may help foster fairer and more efficient markets for data through bringing about clearer tools and concepts that better define how data can be used in the fields of AI and ML. The paper's approach is summarized in a new family of data license language - \textit{the Montreal Data License (MDL)}. Alongside this new license, the authors and their collaborators have developed a web-based tool to generate license language espousing the taxonomies articulated in this paper.

2019-03-21

ArXiv (prépublication)

Online continual learning with no task boundaries

Rahaf Aljundi

Min Lin

Baptiste Goujaud

Continual learning is the ability of an agent to learn online with a non-stationary and never-ending stream of data. A key component for suc… (voir plus)h never-ending learning process is to overcome the catastrophic forgetting of previously seen data, a problem that neural networks are well known to suffer from. The solutions developed so far often relax the problem of continual learning to the easier task-incremental setting, where the stream of data is divided into tasks with clear boundaries. In this paper, we break the limits and move to the more challenging online setting where we assume no information of tasks in the data stream. We start from the idea that each learning step should not increase the losses of the previously learned examples through constraining the optimization process. This means that the number of constraints grows linearly with the number of examples, which is a serious limitation. We develop a solution to select a ﬁxed number of constraints that we use to approximate the feasible region deﬁned by the original constraints. We compare our approach against the methods that rely on task boundaries to select a ﬁxed set of examples, and show comparable or even better results, especially when the boundaries are blurry or when the data distributions are imbalanced.

2019-03-20

arXiv.org (prépublication)

dblp.uni-trier.de

Automated segmentation of cortical layers in BigBrain reveals divergent cortical and laminar thickness gradients in sensory and motor cortices.

Konrad Wagstyl

Stéphanie Larocque

Guillem Cucurull

Claude Lepage

Joseph Paul Cohen

Sebastian Bludau

Nicola Palomero-Gallagher

L. Lewis

Thomas Funck

Hannah Spitzer

Timo Dicksheid

Paul C Fletcher

Adriana Romero Soriano

Karl Zilles

Katrin Amunts

Alan C. Evans

Abstract Large-scale in vivo neuroimaging datasets offer new possibilities for reliable, well-powered measures of interregional structural d… (voir plus)ifferences and biomarkers of pathological changes in a wide variety of neurological and psychiatric diseases. However, so far studies have been structurally and functionally imprecise, being unable to relate pathological changes to specific cortical layers or neurobiological processes. We developed artificial neural networks to segment cortical and laminar surfaces in the BigBrain, a 3D histological model of the human brain. We sought to test whether previously-reported thickness gradients, as measured by MRI, in sensory and motor processing cortices, were present in a histological atlas of cortical thickness, and which cortical layers were contributing to these gradients. Identifying common gradients of cortical organisation enables us to meaningfully relate microstructural, macrostructural and functional cortical parameters. Analysis of thickness gradients across sensory cortices, using our fully segmented six-layered model, was consistent with MRI findings, showing increasing thickness moving up the processing hierarchy. In contrast, fronto-motor cortices showed the opposite pattern with changes in thickness of layers III, V and VI being the primary drivers of these gradients. As well as identifying key differences between sensory and motor gradients, our findings show how the use of this laminar atlas offers insights that will be key to linking single-neuron morphological changes, mesoscale cortical layers and macroscale cortical thickness.

2019-03-17

bioRxiv (prépublication)

BigBrain 3D atlas of cortical layers: Cortical and laminar thickness gradients diverge in sensory and motor cortices

Konrad Wagstyl

Stéphanie Larocque

Guillem Cucurull

Claude Lepage

Joseph Paul Cohen

Sebastian Bludau

Nicola Palomero-Gallagher

L. Lewis

Thomas Funck

Hannah Spitzer

Timo Dicksheid

Paul C Fletcher

Adriana Romero Soriano

Karl Zilles

Katrin Amunts

Alan C. Evans

Histological atlases of the cerebral cortex, such as those made famous by Brodmann and von Economo, are invaluable for understanding human b… (voir plus)rain microstructure and its relationship with functional organization in the brain. However, these existing atlases are limited to small numbers of manually annotated samples from a single cerebral hemisphere, measured from 2D histological sections. We present the first whole-brain quantitative 3D laminar atlas of the human cerebral cortex. This atlas was derived from a 3D histological model of the human brain at 20 micron isotropic resolution (BigBrain), using a convolutional neural network to segment, automatically, the cortical layers in both hemispheres. Our approach overcomes many of the historical challenges with measurement of histological thickness in 2D and the resultant laminar atlas provides an unprecedented level of precision and detail. We utilized this BigBrain cortical atlas to test whether previously reported thickness gradients, as measured by MRI in sensory and motor processing cortices, were present in a histological atlas of cortical thickness, and which cortical layers were contributing to these gradients. Cortical thickness increased across sensory processing hierarchies, primarily driven by layers III, V and VI. In contrast, fronto-motor cortices showed the opposite pattern, with decreases in total and pyramidal layer thickness. These findings illustrate how this laminar atlas will provide a link between single-neuron morphology, mesoscale cortical layering, macroscopic cortical thickness and, ultimately, functional neuroanatomy.

2019-03-17

bioRxiv (prépublication)

BigBrain 3D atlas of cortical layers: Cortical and laminar thickness gradients diverge in sensory and motor cortices

Konrad Wagstyl

Stéphanie Larocque

Guillem Cucurull

Claude Lepage

Joseph Paul Cohen

Sebastian Bludau

Nicola Palomero-Gallagher

L. Lewis

Thomas Funck

Hannah Spitzer

Timo Dicksheid

Paul C Fletcher

Adriana Romero Soriano

Karl Zilles

Katrin Amunts

Alan C. Evans

2019-03-17

bioRxiv (prépublication)