Portrait de Pascal Vincent

Pascal Vincent

Membre industriel principal
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Chercheur scientifique, Facebook AI Research (FAIR) Montréal
Sujets de recherche
Apprentissage de représentations
Apprentissage profond

Biographie

Pascal Vincent est chercheur à Meta (FAIR, Fundamental IA Research), professeur associé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre fondateur de Mila – Institut québécois d’intelligence artificielle et chercheur associé à l'Institut canadien de recherches avancées (CIFAR, programme Apprentissage automatique, apprentissage biologique).

Ses recherches sur les principes et les algorithmes de l'apprentissage par représentation l'ont amené à développer plusieurs idées fondamentales qui sont devenues des éléments clés du succès des méthodes d'apprentissage profond. Parmi ses travaux les plus influents, il est coauteur de l'article fondateur sur les modèles de langage neuronaux « A Neural Probabilistic Language Model » (Bengio et al., 2013), qui a jeté les bases de tous les modèles de langage fondés sur les réseaux de neurones artificiels. Son travail sur les auto-encodeurs de débruitage (Vincent et al., 2008, 2010) a été le premier à proposer la tâche prétexte de remplir des blancs artificiellement introduits dans le but d'apprendre des représentations utiles dans n'importe quelle modalité, un précurseur de ce que l'on appelle aujourd'hui « l'apprentissage autosupervisé ». En 2011, il a développé le principe du denoising score matching (P. Vincent, « A connection between score matching and denoising autoencoders », Neural Computation, 2011), qui est maintenant couramment utilisé pour former des modèles génératifs basés sur la diffusion. Ses recherches actuelles se concentrent sur de nouvelles théories et de nouveaux algorithmes pour l'apprentissage de la représentation afin de permettre une généralisation robuste en dehors de la distribution.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni

Publications

Enhanced Biomedical Knowledge Discovery From Unstructured Text Using Contextual Embeddings
Iz Beltagy
Kyle Lo
Arman Cohan. 2019
Scib-500
R´ejean Ducharme
Rishi Bommasani
Kelly Davis
Claire Cardie
Billy Chiu
Sampo Pyysalo
Ivan Vuli´c
Extracting knowledge from large, unstruc-001 tured text corpora presents a challenge. Re-002 cently, authors have utilized unsupervised, 003… (voir plus) static word embeddings to uncover "latent 004 knowledge" contained within domain-specific 005 scientific corpora. Here semantic-similarity 006 measures between representations of concepts, 007 objects or entities were used to predict re-008 lationships, which were later verified using 009 physical methods. Static language models 010 have recently been surpassed at most down-011 stream tasks by massively pre-trained, contex-012 tual language models like BERT. Some have 013 postulated that contextualized embeddings po-014 tentially yield word representations superior 015 to static ones for knowledge-discovery pur-016 poses. In an effort to address this ques-017 tion, two biomedically-trained BERT models 018 (BioBERT, SciBERT) were used to encode 019 n = 500, 1000 or 5000 sentences containing 020 words of interest extracted from a biomedical 021 corpus (Coronavirus Open Research Dataset). 022 The n representations for the words of inter-023 est were subsequently extracted and then ag-024 gregated to yield static-equivalent word rep-025 resentations. These words belonged to the 026 vocabularies of intrinsic benchmarking tools 027 for the biomedical domain (Bio-SimVerb and 028 Bio-SimLex), which assess quality of word 029 representations using semantic-similarity and 030 relatedness measures. Using intrinsic bench-031 marking tasks, feasibility of using contextual-032 ized word representations for knowledge dis-033 covery tasks can be assessed: Word represen-034 tations that better encode described reality are 035 expected to perform better (i.e. closer to do-036 main experts). As postulated, BERT embed-037 dings outperform static counterparts
Accounting for Variance in Machine Learning Benchmarks
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the l… (voir plus)earning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
Cooperative Semi-Supervised Transfer Learning of Machine Reading Comprehension
Oliver Bender
Franz Josef Och
R´ejean Ducharme
Kevin Clark
Quoc Minh-Thang Luong
V. Le
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Adam Fisch
Alon Talmor
Robin Jia
Minjoon Seo
Michael R. Glass
A. Gliozzo
Rishav Chakravarti
Ian J Goodfellow
Mehdi Mirza
Serhii Havrylov
Ivan Titov. 2017
Emergence
Jun-Tao He
Jiatao Gu
Jiajun Shen
Marc’Aurelio
Matthew Henderson
I. Casanueva
Nikola Mrkˇsi´c
Pei-hao Su
Tsung-Hsien Wen
Ivan Vuli´c
Yi Tay
Che Zheng
Dara Bahri
Donald
Metzler Aaron
Courville
Structformer
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Thomas Wolf
Lysandre Debut
Julien Victor Sanh
Clement Chaumond
Anthony Delangue
Pier-339 Moi
Tim ric Cistac
R´emi Rault
Morgan Louf
Qizhe Xie
Eduard H. Hovy
Silei Xu
Sina Jandaghi Semnani
Giovanni Campagna
Pretrained language models have significantly 001 improved the performance of down-stream 002 language understanding tasks, including ex-00… (voir plus)3 tractive question answering, by providing 004 high-quality contextualized word embeddings. 005 However, training question answering models 006 still requires large amounts of annotated data 007 for specific domains. In this work, we pro-008 pose a cooperative, self-play learning frame-009 work, REGEX, for automatically generating 010 more non-trivial question-answer pairs to im-011 prove model performance. REGEX is built 012 upon a masked answer extraction task with an 013 interactive learning environment containing an 014 answer entity REcognizer, a question Gener-015 ator, and an answer EXtractor. Given a pas-016 sage with a masked entity, the generator gen-017 erates a question around the entity, and the 018 extractor is trained to extract the masked en-019 tity with the generated question and raw texts. 020 The framework allows the training of question 021 generation and answering models on any text 022 corpora without annotation. We further lever-023 age a reinforcement learning technique to re-024 ward generating high-quality questions and to 025 improve the answer extraction model’s perfor-026 mance. Experiment results show that REGEX 027 outperforms the state-of-the-art (SOTA) pre-028 trained language models and transfer learning 029 approaches on standard question-answering 030 benchmarks, and yields the new SOTA per-031 formance under given model size and transfer 032 learning settings. 033
Tackling Situated Multi-Modal Task-Oriented Dialogs with a Single Transformer Model
−. i.eUT
R´ejean Ducharme
Morgan Kaufmann
Yen-Chun Chen
Linjie Li
Licheng Yu
Matthew Henderson
Blaise Thomson
Ehsan Hosseini-Asl
Bryan McCann
Chien-Sheng Wu
Samuel Humeau
Kurt Shuster
Marie-Anne Lachaux
The Situated Interactive Multi-Modal Conver-001 sations (SIMMC) 2.0 aims to create virtual 002 shopping assistants that can accept complex 0… (voir plus)03 multi-modal inputs, i.e. visual appearances of 004 objects and user utterances. It consists of four 005 subtasks, multi-modal disambiguation (MM-006 Disamb), multi-modal coreference resolution 007 (MM-Coref), multi-modal dialog state tracking 008 (MM-DST), and response retrieval and genera-009 tion. While many task-oriented dialog systems 010 usually tackle each subtask separately, we pro-011 pose a jointly learned encoder-decoder that per-012 forms all four subtasks at once for efficiency. 013 Moreover, we handle the multi-modality of the 014 challenge by representing visual objects as spe-015 cial tokens whose joint embedding is learned 016 via auxiliary tasks. This approach won the MM-017 Coref and response retrieval subtasks and nom-018 inated runner-up for the remaining subtasks 019 using a single unified model. In particular, 020 our model achieved 81.5% MRR, 71.2% R@1, 021 95.0% R@5, 98.2% R@10, and 1.9 mean rank 022 in response retrieval task, setting a high bar for 023 the state-of-the-art result in the SIMMC 2.0 024 track of the Dialog Systems Technology Chal-025 lenge 10 (DSTC10). 026
Implicit Regularization in Deep Learning: A View from Function Space
We approach the problem of implicit regularization in deep learning from a geometrical viewpoint. We highlight a possible regularization eff… (voir plus)ect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions. By extrapolating a new analysis of Rademacher complexity bounds in linear models, we propose and study a new heuristic complexity measure for neural networks which captures this phenomenon, in terms of sequences of tangent kernel classes along in the learning trajectories.
Implicit Regularization in Deep Learning: A View from Function Space
Stochastic Neural Network with Kronecker Flow
Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (voir plus) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.
Stochastic Hamiltonian Gradient Methods for Smooth Games
The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the c… (voir plus)lass of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.
An Empirical Study of Batch Normalization and Group Normalization in Conditional Computation
Batch normalization has been widely used to improve optimization in deep neural networks. While the uncertainty in batch statistics can act … (voir plus)as a regularizer, using these dataset statistics specific to the training set impairs generalization in certain tasks. Recently, alternative methods for normalizing feature activations in neural networks have been proposed. Among them, group normalization has been shown to yield similar, in some domains even superior performance to batch normalization. All these methods utilize a learned affine transformation after the normalization operation to increase representational power. Methods used in conditional computation define the parameters of these transformations as learnable functions of conditioning information. In this work, we study whether and where the conditional formulation of group normalization can improve generalization compared to conditional batch normalization. We evaluate performances on the tasks of visual question answering, few-shot learning, and conditional image generation.
Stochastic Neural Network with Kronecker Flow
Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (voir plus) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.
Iteratively unveiling new regions of interest in Deep Learning models
Tess Berthier
Lisa Di Jorio
Recent advance of deep learning has been transforming the landscape in many domains. However, understanding the predictions of a deep networ… (voir plus)k remains a challenge, which is especially sensitive in health care domains as interpretability is key. Techniques that rely on saliency maps -highlighting the region of an image that influence the classifier’s decision the mostare often used for that purpose. However, gradients fluctuation make saliency maps noisy and thus difficult to interpret at a human level. Moreover, models tend to focus on one particular influential region of interest (ROI) in the image, even though other regions might be relevant for the decision. We propose a new framework that refines those saliency maps to generate segmentation masks over the ROI on the initial image. In a second contribution, we propose to apply those masks over the original inputs, then evaluate our classifier on the masked inputs to identify previously overlooked ROI. This iterative procedure allows us to emphasize new region of interests by extracting meaningful information from the saliency maps.
Theano: A Python framework for fast computation of mathematical expressions
Rami Al-rfou'
Amjad Almahairi
Christof Angermüller
Frédéric Bastien
Justin S. Bayer
A. Belikov
A. Belopolsky
J. Bergstra
Josh Bleecher Snyder
Paul F. Christiano
Marc-Alexandre Côté
Myriam Côté
Julien Demouth
Sander Dieleman
M'elanie Ducoffe
Ziye Fan
Mathieu Germain
Ian J. Goodfellow
Matthew Graham
Balázs Hidasi
Arjun Jain
S'ebastien Jean
Kai Jia
Mikhail V. Korobov
Vivek Kulkarni
Pascal Lamblin
Eric P. Larsen
S. Lee
Simon-mark Lefrancois
J. Livezey
Cory R. Lorenz
Jeremiah L. Lowin
Qianli M. Ma
R. McGibbon
Mehdi Mirza
Alberto Orlandi
Colin Raffel
Daniel Renshaw
Matthew David Rocklin
Markus Dr. Roth
Peter Sadowski
John Salvatier
Jan Schlüter
John D. Schulman
Gabriel Schwartz
Iulian V. Serban
Samira Shabanian
Sigurd Spieckermann
S. Subramanyam
Gijs van Tulder
Joseph P. Turian
Sebastian Urban
Dustin J. Webb
M. Willson
Lijun Xue
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (voir plus)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.