Portrait de Guillaume Rabusseau

Guillaume Rabusseau

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage profond
Apprentissage sur graphes
Factorisation tensorielle
Modèles probabilistes
Réseaux de neurones en graphes
Réseaux de neurones récurrents
Systèmes de recommandation
Théorie de l'apprentissage automatique
Théorie de l'information quantique

Biographie

Depuis septembre 2018, je suis professeur adjoint à Mila – Institut québécois d’intelligence artificielle et au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal (UdeM). Je suis titulaire d’une chaire de recherche en IA Canada-CIFAR depuis mars 2019. Avant de me joindre à l’UdeM, j’ai été chercheur postdoctoral au laboratoire de raisonnement et d'apprentissage de l'Université McGill, où j'ai travaillé avec Prakash Panangaden, Joelle Pineau et Doina Precup.

J'ai obtenu mon doctorat en 2016 à l’Université d’Aix-Marseille (AMU), où j'ai travaillé dans l'équipe Qarma (apprentissage automatique et multimédia), sous la supervision de François Denis et Hachem Kadri. Auparavant, j'ai obtenu une maîtrise en informatique fondamentale de l'AMU et une licence en informatique de la même université en formation à distance.

Je m'intéresse aux méthodes de tenseurs pour l'apprentissage automatique et à la conception d'algorithmes d'apprentissage pour les données structurées par l’utilisation de l'algèbre linéaire et multilinéaire (par exemple, les méthodes spectrales).

Étudiants actuels

Postdoctorat - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice alumni - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Postdoctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Collaborateur·rice alumni - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Co-superviseur⋅e :

Publications

Few Shot Image Generation via Implicit Autoencoding of Support Sets
Andy Huang
Kuan-Chieh Wang
Alireza Makhzani
Recent generative models such as generative adversarial networks have achieved remarkable success in generating realistic images, but they r… (voir plus)equire large training datasets and computational resources. The goal of few-shot image generation is to learn the distribution of a new dataset from only a handful of examples by transferring knowledge learned from structurally similar datasets. Towards achieving this goal, we propose the “Implicit Support Set Autoencoder” (ISSA) that adversarially learns the relationship across datasets using an unsupervised dataset representation, while the distribution of each individual dataset is learned using implicit distributions. Given a few examples from a new dataset, ISSA can generate new samples by inferring the representation of the underlying distribution using a single forward pass. We showcase significant gains from our method on generating high quality and diverse images for unseen classes in the Omniglot and CelebA datasets in few-shot image generation settings.
Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models
Behnoush Khavari
Tensor network (TN) methods have been a key ingredient of advances in condensed matter physics and have recently sparked interest in the mac… (voir plus)hine learning community for their ability to compactly represent very high-dimensional objects. TN methods can for example be used to efficiently learn linear models in exponentially large feature spaces [56]. In this work, we derive upper and lower bounds on the VC-dimension and pseudo-dimension of a large class of TN models for classification, regression and completion. Our upper bounds hold for linear models parameterized by arbitrary TN structures, and we derive lower bounds for common tensor decomposition models (CP, Tensor Train, Tensor Ring and Tucker) showing the tightness of our general upper bound. These results are used to derive a generalization bound which can be applied to classification with low-rank matrices as well as linear classifiers based on any of the commonly used tensor decomposition models. As a corollary of our results, we obtain a bound on the VC-dimension of the matrix product state classifier introduced in [56] as a function of the so-called bond dimension (i.e. tensor train rank), which answers an open problem listed by Cirac, Garre-Rubio and Pérez-García in [13].
Rademacher Random Projections with Tensor Networks
Beheshteh T. Rakhshan
Random projection (RP) have recently emerged as popular techniques in the machine learning community for their ability in reducing the dimen… (voir plus)sion of very high-dimensional tensors. Following the work in [30], we consider a tensorized random projection relying on Tensor Train (TT) decomposition where each element of the core tensors is drawn from a Rademacher distribution. Our theoretical results reveal that the Gaussian low-rank tensor represented in compressed form in TT format in [30] can be replaced by a TT tensor with core elements drawn from a Rademacher distribution with the same embedding size. Experiments on synthetic data demonstrate that tensorized Rademacher RP can outperform the tensorized Gaussian RP studied in [30]. In addition, we show both theoretically and experimentally, that the tensorized RP in the Matrix Product Operator (MPO) format is not a Johnson-Lindenstrauss transform (JLT) and therefore not a well-suited random projection map
Extracting Weighted Automata for Approximate Minimization in Language Modelling
Understanding Capacity Saturation in Incremental Learning
Vincent Francois-Lavet
Quantum Tensor Networks, Stochastic Processes, and Weighted Automata
Siddarth Srinivasan
Sandesh M. Adhikary
Jacob Miller
Byron Boots
Modeling joint probability distributions over sequences has been studied from many perspectives. The physics community developed matrix prod… (voir plus)uct states, a tensor-train decomposition for probabilistic modeling, motivated by the need to tractably model many-body systems. But similar models have also been studied in the stochastic processes and weighted automata literature, with little work on how these bodies of work relate to each other. We address this gap by showing how stationary or uniform versions of popular quantum tensor network models have equivalent representations in the stochastic processes and weighted automata literature, in the limit of infinitely long sequences. We demonstrate several equivalence results between models used in these three communities: (i) uniform variants of matrix product states, Born machines and locally purified states from the quantum tensor networks literature, (ii) predictive state representations, hidden Markov models, norm-observable operator models and hidden quantum Markov models from the stochastic process literature,and (iii) stochastic weighted automata, probabilistic automata and quadratic automata from the formal languages literature. Such connections may open the door for results and methods developed in one area to be applied in another.
Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata
We address the approximate minimization problem for weighted finite automata (WFAs) with weights in …
Assessing the Impact: Does an Improvement to a Revenue Management System Lead to an Improved Revenue?
Greta Laage
Andrea Lodi
Estimating the Impact of an Improvement to a Revenue Management System: An Airline Application
Greta Laage
William L. Hamilton
Andrea Lodi
Airlines have been making use of highly complex Revenue Management Systems to maximize revenue for decades. Estimating the impact of changin… (voir plus)g one component of those systems on an important outcome such as revenue is crucial, yet very challenging. It is indeed the difference between the generated value and the value that would have been generated keeping business as usual, which is not observable. We provide a comprehensive overview of counterfactual prediction models and use them in an extensive computational study based on data from Air Canada to estimate such impact. We focus on predicting the counterfactual revenue and compare it to the observed revenue subject to the impact. Our microeconomic application and small expected treatment impact stand out from the usual synthetic control applications. We present accurate linear and deep-learning counterfactual prediction models which achieve respectively 1.1% and 1% of error and allow to estimate a simulated effect quite accurately.
Scalable Change Point Detection for Dynamic Graphs
Real world networks often evolve in complex ways over time. Understanding anomalies in dynamic networks is crucial for applications such as … (voir plus)traffic accident detection, intrusion identification and detection of ecosystem disturbances. In this work, we focus on the problem of change point detection in dynamic graphs. The goal is to identify time steps where the graph structure deviates significantly from the norm. Despite empirical success of recent methods, building a change point detection method for real world dynamic graphs, which often scale to millions of nodes, remains an open question. To fill this gap, we propose LADdos, a scalable method for change point detection in dynamic graphs. LADdos brings together ideas from two recent works: an accurate change point detection method for graphs called LAD [10] which detects the changes in the full Laplacian spectrum of the graph in each timestamp, and the general framework of network density of states (DOS) [5] which models the distribution of the singular values through efficient approximation methods. In experiments with two common graph models –the Stochastic Block Model (SBM) and the Barabási-Albert (BA) model – we show that LADdos has equal performance to LAD, which is the current state-of-the-art, while being orders of magnitude faster. For instance, on a dynamic graph with total 21 million edges over 150 timestamps, LADdos achieves 100x speedup when compared to LAD.
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix
Thang Doan
Mehdi Abbana Bennani
Pierre Alquier
Towards a Trace-Preserving Tensor Network Representation of Quantum Channels
Siddarth Srinivasan
Sandesh M. Adhikary
Jacob Miller
Bibek Pokharel
Byron Boots
The problem of characterizing quantum channels arises in a number of contexts such as quantum process tomography and quantum error correctio… (voir plus)n. However, direct approaches to parameterizing and optimizing the Choi matrix representation of quantum channels face a curse of dimensionality: the number of parameters scales exponentially in the number of qubits. Recently, Torlai et al. [2020] proposed using locally purified density operators (LPDOs), a tensor network representation of Choi matrices, to overcome the unfavourable scaling in parameters. While the LPDO structure allows it to satisfy a ‘complete positivity’ (CP) constraint required of physically valid quantum channels, it makes no guarantees about a similarly required ‘trace preservation’ (TP) constraint. In practice, the TP constraint is violated, and the learned quantum channel may even be trace-increasing, which is non-physical. In this work, we present the problem of optimizing over TP LPDOs, discuss two approaches to characterizing the TP constraints on LPDOs, and outline the next steps for developing an optimization scheme.