Portrait de Yoshua Bengio

Yoshua Bengio

Membre académique principal

Chaire en IA Canada-CIFAR

Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle

Fondateur et Conseiller scientifique, Équipe de direction

Sujets de recherche

Apprentissage automatique médical

Apprentissage de représentations

Apprentissage par renforcement

Apprentissage profond

Causalité

Modèles génératifs

Modèles probabilistes

Modélisation moléculaire

Neurosciences computationnelles

Raisonnement

Réseaux de neurones en graphes

Réseaux de neurones récurrents

Théorie de l'apprentissage automatique

Traitement du langage naturel

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Cassidy MacNeil, adjointe principale et responsable des opérations cassidy.macneil@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Scaling in the service of reasoning & model-based ML

4 avril 2023

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

par

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

par

Jake P. Taylor-King

Generative Flow Networks

15 mars 2022

Les réseaux de flot génératifs

par

Publications

GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

R Devon Hjelm

Joseph Paul Cohen

Aaron Courville

Directed latent variable models that formulate the joint distribution as …

2016-12-31

Advances in Neural Information Processing Systems 30 (NIPS 2017) (publié)

Independently Controllable Factors

Valentin Thomas

Emmanuel Bengio

Philippe Beaudoin

Marie-Jean Meurs

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it rema… (voir plus)ins an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.

2016-12-31

arXiv (prépublication)

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Ishaan Gulrajani

Jose Sotelo

Aaron Courville

In this paper we propose a novel model for unconditional audio generation task that generates one audio sample at a time. We show that our m… (voir plus)odel which profits from combining memory-less modules, namely autoregressive multilayer perceptron, and stateful recurrent neural networks in a hierarchical structure is de facto powerful to capture the underlying sources of variations in temporal domain for very long time on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

2016-12-31

ICLR.cc/2017/conference (poster)

Z-Forcing: Training Stochastic Recurrent Networks

Alessandro Sordoni

Marc-Alexandre Côté

Nan Rosemary Ke

Many efforts have been devoted to training generative latent variable models with autoregressive decoders, such as recurrent neural networks… (voir plus) (RNN). Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech. We unify successful ideas from recently proposed architectures into a stochastic recurrent model: each step in the sequence is associated with a latent variable that is used to condition the recurrent dynamics for future steps. Training is performed with amortized variational inference where the approximate posterior is augmented with a RNN that runs backward through the sequence. In addition to maximizing the variational lower bound, we ease training of the latent variables by adding an auxiliary cost which forces them to reconstruct the state of the backward recurrent network. This provides the latent variables with a task-independent objective that enhances the performance of the overall model. We found this strategy to perform better than alternative approaches such as KL annealing. Although being conceptually simple, our model achieves state-of-the-art results on standard speech benchmarks such as TIMIT and Blizzard and competitive performance on sequential MNIST. Finally, we apply our model to language modeling on the IMDB dataset where the auxiliary cost helps in learning interpretable latent variables. Source Code: https://github.com/anirudh9119/zforcing_nips17

2016-12-31

Advances in Neural Information Processing Systems 30 (NIPS 2017) (publié)

Deep Learning

Aaron Courville

Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy… (voir plus) of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

2016-11-17

MIT Press eBooks (inconnu)

HeMIS: Hetero-Modal Image Segmentation

Mohammad Havaei

Nicolas Guizard

Nicolas Chapados

2016-10-01

Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 (publié)

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Mohammad Pezeshki

Philemon Brakel

Aaron Courville

Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic fe… (voir plus)atures for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition system instead of hybrid settings. However, RNNs are computationally expensive and sometimes difficult to train. In this paper, inspired by the advantages of both CNNs and the CTC approach, we propose an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections. By evaluating the approach on the TIMIT phoneme recognition task, we show that the proposed model is not only computationally efficient, but also competitive with the existing baseline systems. Moreover, we argue that CNNs have the capability to model temporal correlations with appropriate context information.

2016-09-07

Interspeech 2016 (publié)

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

Iulian V. Serban

Alberto García-Durán

Caglar Gulçehre

A. Chandar

Aaron Courville

Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. Howeve… (voir plus)r, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the question-generation model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions.

2016-07-31

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (publié)

ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation

Francesco Visin

Adriana Romero

Matteo Matteucci

Marco Ciccone

Aaron Courville

We propose a structured prediction architecture, which exploits the local generic features extracted by Convolutional Neural Networks and th… (voir plus)e capacity of Recurrent Neural Networks (RNN) to retrieve distant dependencies. The proposed architecture, called ReSeg, is based on the recently introduced ReNet model for image classification. We modify and extend it to perform the more challenging task of semantic segmentation. Each ReNet layer is composed of four RNN that sweep the image horizontally and vertically in both directions, encoding patches or activations, and providing relevant global information. Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features. Upsampling layers follow ReNet layers to recover the original image resolution in the final predictions. The proposed ReSeg architecture is efficient, flexible and suitable for a variety of semantic segmentation tasks. We evaluate ReSeg on several widely-used semantic segmentation datasets: Weizmann Horse, Oxford Flower, and CamVid; achieving state-of-the-art performance. Results show that ReSeg can act as a suitable architecture for semantic segmentation tasks, and may have further applications in other structured prediction problems. The source code and model hyperparameters are available on https://github.com/fvisin/reseg.

2016-06-30

2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (publié)

Deconstructing the Ladder Network Architecture

Mohammad Pezeshki

Linxi Fan

Philemon Brakel

Aaron Courville

The Manual labeling of data is and will remain a costly endeavor. For this reason, semi-supervised learning remains a topic of practical imp… (voir plus)ortance. The recently proposed Ladder Network is one such approach that has proven to be very successful. In addition to the supervised objective, the Ladder Network also adds an unsupervised objective corresponding to the reconstruction costs of a stack of denoising autoencoders. Although the empirical results are impressive, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. In order to help elucidate and disentangle the different ingredients in the Ladder Network recipe, this paper presents an extensive experimental investigation of variants of the Ladder Network in which we replace or remove individual components to gain more insight into their relative importance. We find that all of the components are necessary for achieving optimal performance, but they do not contribute equally. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connection, followed by the application of noise, and finally the choice of what we refer to as the `combinator function' in the decoder path. We also find that as the number of labeled training examples increases, the lateral connections and reconstruction criterion become less important, with most of the improvement in generalization being due to the injection of noise in each layer. Furthermore, we present a new type of combinator function that outperforms the original design in both fully- and semi-supervised tasks, reducing record test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples respectively.

2016-06-10

Proceedings of The 33rd International Conference on Machine Learning (publié)

proceedings.mlr.press

Brain Tumor Segmentation with Deep Neural Networks

Mohammad Havaei

David Warde-Farley

Aaron Courville

Pierre-Marc Jodoin

Hugo Larochelle

2016-05-20

Medical Image Analysis (inconnu)

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

Iulian V. Serban

Alessandro Sordoni

Aaron Courville

We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Ge… (voir plus)nerative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural network to the dialogue domain, and demonstrate that this model is competitive with state-of-the-art neural language models and back-off n-gram models. We investigate the limitations of this and similar approaches, and show how its performance can be improved by bootstrapping the learning from a larger question-answer pair corpus and from pretrained word embeddings.

2016-03-04

Proceedings of the AAAI Conference on Artificial Intelligence (publié)