Guillaume Rabusseau

Biographie

Depuis septembre 2018, je suis professeur adjoint à Mila – Institut québécois d’intelligence artificielle et au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal (UdeM). Je suis titulaire d’une chaire de recherche en IA Canada-CIFAR depuis mars 2019. Avant de me joindre à l’UdeM, j’ai été chercheur postdoctoral au laboratoire de raisonnement et d'apprentissage de l'Université McGill, où j'ai travaillé avec Prakash Panangaden, Joelle Pineau et Doina Precup.

J'ai obtenu mon doctorat en 2016 à l’Université d’Aix-Marseille (AMU), où j'ai travaillé dans l'équipe Qarma (apprentissage automatique et multimédia), sous la supervision de François Denis et Hachem Kadri. Auparavant, j'ai obtenu une maîtrise en informatique fondamentale de l'AMU et une licence en informatique de la même université en formation à distance.

Je m'intéresse aux méthodes de tenseurs pour l'apprentissage automatique et à la conception d'algorithmes d'apprentissage pour les données structurées par l’utilisation de l'algèbre linéaire et multilinéaire (par exemple, les méthodes spectrales).

Étudiants actuels

Jun Dai

Postdoctorat - UdeM

Alireza Dizaji

Collaborateur·rice alumni - UdeM

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Maude Lizaire

Doctorat - UdeM

Sitao Luan

Postdoctorat - McGill

Co-superviseur⋅e :

Reihaneh Rabbany

Ewan Murphy

Collaborateur·rice de recherche - INRIA

Charlotte Noxon

Maîtrise recherche - UdeM

Soroush Omranpour

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Reihaneh Rabbany

Michael Rizvi-Martel

Doctorat - UdeM

Co-superviseur⋅e :

Pascal Tikeng Notsawo

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice de recherche - UdeM

Co-superviseur⋅e :

Reihaneh Rabbany

Site web

Beheshteh Toloueirakhshan

Doctorat - UdeM

Site web

Publications

A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix

Thang Doan

Mehdi Abbana Bennani

Bogdan Mazoure

Pierre Alquier

Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime. Although maj… (voir plus)or advances have been made in the field, one recurring problem which remains unsolved is that of Catastrophic Forgetting (CF). While the issue has been extensively studied empirically, little attention has been paid from a theoretical angle. In this paper, we show that the impact of CF increases as two tasks increasingly align. We introduce a measure of task similarity called the NTK overlap matrix which is at the core of CF. We analyze common projected gradient algorithms and demonstrate how they mitigate forgetting. Then, we propose a variant of Orthogonal Gradient Descent (OGD) which leverages structure of the data through Principal Component Analysis (PCA). Experiments support our theoretical findings and show how our method reduces CF on classical CL datasets.

2020-10-07

ArXiv (preprint)

Laplacian Change Point Detection for Dynamic Graphs

2020-08-20

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (publié)

doi.org

Adaptive Learning of Tensor Network Structures

Tensor Networks (TN) offer a powerful framework to efficiently represent very high-dimensional objects. TN have recently shown their potenti… (voir plus)al for machine learning applications and offer a unifying view of common tensor decomposition models such as Tucker, tensor train (TT) and tensor ring (TR). However, identifying the best tensor network structure from data for a given task is challenging. In this work, we leverage the TN formalism to develop a generic and efficient adaptive algorithm to jointly learn the structure and the parameters of a TN from data. Our method is based on a simple greedy approach starting from a rank one tensor and successively identifying the most promising tensor network edges for small rank increments. Our algorithm can adaptively identify TN structures with small number of parameters that effectively optimize any differentiable objective function. Experiments on tensor decomposition, tensor completion and model compression tasks demonstrate the effectiveness of the proposed algorithm. In particular, our method outperforms the state-of-the-art evolutionary topology search [Li and Sun, 2020] for tensor decomposition of images (while being orders of magnitude faster) and finds efficient tensor network structures to compress neural networks outperforming popular TT based approaches [Novikov et al., 2015].

2020-08-12

ArXiv (prépublication)

openreview.net

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability (Extended Abstract)

Vincent François-Lavet

Joelle Pineau

Damien Ernst

Raphael Fonteneau

When an agent has limited information on its environment, the suboptimality of an RL algorithm can be decomposed into the sum of two terms: … (voir plus)a term related to an asymptotic bias (suboptimality with unlimited data) and a term due to overfitting (additional suboptimality due to limited data). In the context of reinforcement learning with partial observability, this paper provides an analysis of the tradeoff between these two error sources. In particular, our theoretical analysis formally characterizes how a smaller state representation increases the asymptotic bias while decreasing the risk of overfitting.

2020-07-01

International Joint Conference on Artificial Intelligence (publié)

doi.org

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

Tensorized Random Projections

Beheshteh T. Rakhshan

We introduce a novel random projection technique for efficiently reducing the dimension of very high-dimensional tensors. Building upon clas… (voir plus)sical results on Gaussian random projections and Johnson-Lindenstrauss transforms~(JLT), we propose two tensorized random projection maps relying on the tensor train~(TT) and CP decomposition format, respectively. The two maps offer very low memory requirements and can be applied efficiently when the inputs are low rank tensors given in the CP or TT format. Our theoretical analysis shows that the dense Gaussian matrix in JLT can be replaced by a low-rank tensor implicitly represented in compressed form with random factors, while still approximately preserving the Euclidean distance of the projected inputs. In addition, our results reveal that the TT format is substantially superior to CP in terms of the size of the random projection needed to achieve the same distortion ratio. Experiments on synthetic data validate our theoretical analysis and demonstrate the superiority of the TT decomposition.

2020-03-11

ArXiv (preprint)

RandomNet: Towards Fully Automatic Neural Architecture Design for Multimodal Learning

Stefano Alletto

Shenyang Huang

Vincent François-Lavet

Yohei Nakata

Almost all neural architecture search methods are evaluated in terms of performance (i.e. test accuracy) of the model structures that it fin… (voir plus)ds. Should it be the only metric for a good autoML approach? To examine aspects beyond performance, we propose a set of criteria aimed at evaluating the core of autoML problem: the amount of human intervention required to deploy these methods into real world scenarios. Based on our proposed evaluation checklist, we study the effectiveness of a random search strategy for fully automated multimodal neural architecture search. Compared to traditional methods that rely on manually crafted feature extractors, our method selects each modality from a large search space with minimal human supervision. We show that our proposed random search strategy performs close to the state of the art on the AV-MNIST dataset while meeting the desirable characteristics for a fully automated design process.

2020-03-02

ArXiv (prépublication)

Tensor Networks for Language Modeling

Jacob Miller

John Terilla

The tensor network formalism has enjoyed over two decades of success in modeling the behavior of complex quantum-mechanical systems, but has… (voir plus) only recently and sporadically been leveraged in machine learning. Here we introduce a uniform matrix product state (u-MPS) model for probabilistic modeling of sequence data. We identify several distinctive features of this recurrent generative model, notably the ability to condition or marginalize sampling on characters at arbitrary locations within a sequence, with no need for approximate sampling methods. Despite the sequential architecture of u-MPS, we show that a recursive evaluation algorithm can be used to parallelize its inference and training, with a string of length n only requiring parallel time

2020-03-02

ArXiv (prépublication)

Tensor Networks for Probabilistic Sequence Modeling

Jacob Miller

John Anthony Terilla

Tensor networks are a powerful modeling framework developed for computational many-body physics, which have only recently been applied withi… (voir plus)n machine learning. In this work we utilize a uniform matrix product state (u-MPS) model for probabilistic modeling of sequence data. We first show that u-MPS enable sequence-level parallelism, with length-n sequences able to be evaluated in depth O(log n). We then introduce a novel generative algorithm giving trained u-MPS the ability to efficiently sample from a wide variety of conditional distributions, each one defined by a regular expression. Special cases of this algorithm correspond to autoregressive and fill-in-the-blank sampling, but more complex regular expressions permit the generation of richly structured text in a manner that has no direct analogue in current generative models. Experiments on synthetic text data find u-MPS outperforming LSTM baselines in several sampling tasks, and demonstrate strong generalization in the presence of limited data.

2020-03-02

International Conference on Artificial Intelligence and Statistics (published)

dblp.uni-trier.de

Tensor Networks for Probabilistic Sequence Modeling

Jacob Miller

John Terilla

2020-03-02

International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

Provably efficient reconstruction of policy networks

Thang Doan

Recent research has shown that learning poli-cies parametrized by large neural networks can achieve significant success on challenging reinf… (voir plus)orcement learning problems. However, when memory is limited, it is not always possible to store such models exactly for inference, and com-pressing the policy into a compact representation might be necessary. We propose a general framework for policy representation, which reduces this problem to finding a low-dimensional embedding of a given density function in a separable inner product space. Our framework allows us to de-rive strong theoretical guarantees, controlling the error of the reconstructed policies. Such guaran-tees are typically lacking in black-box models, but are very desirable in risk-sensitive tasks. Our experimental results suggest that the reconstructed policies can use less than 10%of the number of parameters in the original networks, while incurring almost no decrease in rewards.

2020-02-07

ArXiv (prépublication)