Portrait of Guillaume Rabusseau

Guillaume Rabusseau

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research

Biography

I have been an assistant professor at Mila – Quebec Artificial Intelligence Institute and in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal (UdeM) since September 2018. I was awarded a Canada CIFAR AI Chair in March 2019. Before joining UdeM, I was a postdoctoral research fellow in the Reasoning and Learning Lab at McGill University, where I worked with Prakash Panangaden, Joelle Pineau and Doina Precup.

I obtained my PhD in 2016 from Aix-Marseille University (AMU) in France, where I worked in the Qarma team (Machine Learning and Multimedia) under the supervision of François Denis and Hachem Kadri. I also obtained my MSc in fundamental computer science and my BSc in computer science from AMU. I am interested in tensor methods for machine learning and in designing learning algorithms for structured data by leveraging linear and multilinear algebra (e.g., spectral methods).

Current Students

PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating Alumni - University of Mannheim
Co-supervisor :
Postdoctorate - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Master's Research - Université de Montréal
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :

Publications

Quantum Tensor Networks, Stochastic Processes, and Weighted Automata
Siddarth Srinivasan
Sandesh M. Adhikary
Jacob Miller
Byron Boots
Modeling joint probability distributions over sequences has been studied from many perspectives. The physics community developed matrix prod… (see more)uct states, a tensor-train decomposition for probabilistic modeling, motivated by the need to tractably model many-body systems. But similar models have also been studied in the stochastic processes and weighted automata literature, with little work on how these bodies of work relate to each other. We address this gap by showing how stationary or uniform versions of popular quantum tensor network models have equivalent representations in the stochastic processes and weighted automata literature, in the limit of infinitely long sequences. We demonstrate several equivalence results between models used in these three communities: (i) uniform variants of matrix product states, Born machines and locally purified states from the quantum tensor networks literature, (ii) predictive state representations, hidden Markov models, norm-observable operator models and hidden quantum Markov models from the stochastic process literature,and (iii) stochastic weighted automata, probabilistic automata and quadratic automata from the formal languages literature. Such connections may open the door for results and methods developed in one area to be applied in another.
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix
Thang Doan
Mehdi Abbana Bennani
Bogdan Mazoure
Pierre Alquier
Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime. Although maj… (see more)or advances have been made in the field, one recurring problem which remains unsolved is that of Catastrophic Forgetting (CF). While the issue has been extensively studied empirically, little attention has been paid from a theoretical angle. In this paper, we show that the impact of CF increases as two tasks increasingly align. We introduce a measure of task similarity called the NTK overlap matrix which is at the core of CF. We analyze common projected gradient algorithms and demonstrate how they mitigate forgetting. Then, we propose a variant of Orthogonal Gradient Descent (OGD) which leverages structure of the data through Principal Component Analysis (PCA). Experiments support our theoretical findings and show how our method reduces CF on classical CL datasets.
Laplacian Change Point Detection for Dynamic Graphs
Shenyang Huang
Yasmeen Hitti
Adaptive Learning of Tensor Network Structures
Meraj Hashemizadeh
Michelle Liu
Jacob Miller
Tensor Networks (TN) offer a powerful framework to efficiently represent very high-dimensional objects. TN have recently shown their potenti… (see more)al for machine learning applications and offer a unifying view of common tensor decomposition models such as Tucker, tensor train (TT) and tensor ring (TR). However, identifying the best tensor network structure from data for a given task is challenging. In this work, we leverage the TN formalism to develop a generic and efficient adaptive algorithm to jointly learn the structure and the parameters of a TN from data. Our method is based on a simple greedy approach starting from a rank one tensor and successively identifying the most promising tensor network edges for small rank increments. Our algorithm can adaptively identify TN structures with small number of parameters that effectively optimize any differentiable objective function. Experiments on tensor decomposition, tensor completion and model compression tasks demonstrate the effectiveness of the proposed algorithm. In particular, our method outperforms the state-of-the-art evolutionary topology search [Li and Sun, 2020] for tensor decomposition of images (while being orders of magnitude faster) and finds efficient tensor network structures to compress neural networks outperforming popular TT based approaches [Novikov et al., 2015].
On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability (Extended Abstract)
Vincent Francois-Lavet
Damien Ernst
Raphael Fonteneau
When an agent has limited information on its environment, the suboptimality of an RL algorithm can be decomposed into the sum of two terms: … (see more)a term related to an asymptotic bias (suboptimality with unlimited data) and a term due to overfitting (additional suboptimality due to limited data). In the context of reinforcement learning with partial observability, this paper provides an analysis of the tradeoff between these two error sources. In particular, our theoretical analysis formally characterizes how a smaller state representation increases the asymptotic bias while decreasing the risk of overfitting.
Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning
Tianyu Li
Bogdan Mazoure
Tensorized Random Projections
Beheshteh T. Rakhshan
We introduce a novel random projection technique for efficiently reducing the dimension of very high-dimensional tensors. Building upon clas… (see more)sical results on Gaussian random projections and Johnson-Lindenstrauss transforms~(JLT), we propose two tensorized random projection maps relying on the tensor train~(TT) and CP decomposition format, respectively. The two maps offer very low memory requirements and can be applied efficiently when the inputs are low rank tensors given in the CP or TT format. Our theoretical analysis shows that the dense Gaussian matrix in JLT can be replaced by a low-rank tensor implicitly represented in compressed form with random factors, while still approximately preserving the Euclidean distance of the projected inputs. In addition, our results reveal that the TT format is substantially superior to CP in terms of the size of the random projection needed to achieve the same distortion ratio. Experiments on synthetic data validate our theoretical analysis and demonstrate the superiority of the TT decomposition.
RandomNet: Towards Fully Automatic Neural Architecture Design for Multimodal Learning
Stefano Alletto
Shenyang Huang
Vincent Francois-Lavet
Yohei Nakata
Almost all neural architecture search methods are evaluated in terms of performance (i.e. test accuracy) of the model structures that it fin… (see more)ds. Should it be the only metric for a good autoML approach? To examine aspects beyond performance, we propose a set of criteria aimed at evaluating the core of autoML problem: the amount of human intervention required to deploy these methods into real world scenarios. Based on our proposed evaluation checklist, we study the effectiveness of a random search strategy for fully automated multimodal neural architecture search. Compared to traditional methods that rely on manually crafted feature extractors, our method selects each modality from a large search space with minimal human supervision. We show that our proposed random search strategy performs close to the state of the art on the AV-MNIST dataset while meeting the desirable characteristics for a fully automated design process.
Tensor Networks for Language Modeling
Jacob Miller
John Anthony Terilla
The tensor network formalism has enjoyed over two decades of success in modeling the behavior of complex quantum-mechanical systems, but has… (see more) only recently and sporadically been leveraged in machine learning. Here we introduce a uniform matrix product state (u-MPS) model for probabilistic modeling of sequence data. We identify several distinctive features of this recurrent generative model, notably the ability to condition or marginalize sampling on characters at arbitrary locations within a sequence, with no need for approximate sampling methods. Despite the sequential architecture of u-MPS, we show that a recursive evaluation algorithm can be used to parallelize its inference and training, with a string of length n only requiring parallel time
Tensor Networks for Probabilistic Sequence Modeling
Jacob Miller
John Anthony Terilla
Tensor networks are a powerful modeling framework developed for computational many-body physics, which have only recently been applied withi… (see more)n machine learning. In this work we utilize a uniform matrix product state (u-MPS) model for probabilistic modeling of sequence data. We first show that u-MPS enable sequence-level parallelism, with length-n sequences able to be evaluated in depth O(log n). We then introduce a novel generative algorithm giving trained u-MPS the ability to efficiently sample from a wide variety of conditional distributions, each one defined by a regular expression. Special cases of this algorithm correspond to autoregressive and fill-in-the-blank sampling, but more complex regular expressions permit the generation of richly structured text in a manner that has no direct analogue in current generative models. Experiments on synthetic text data find u-MPS outperforming LSTM baselines in several sampling tasks, and demonstrate strong generalization in the presence of limited data.
Tensor Networks for Probabilistic Sequence Modeling
Jacob Miller
John Anthony Terilla
Tensor networks are a powerful modeling framework developed for computational many-body physics, which have only recently been applied withi… (see more)n machine learning. In this work we utilize a uniform matrix product state (u-MPS) model for probabilistic modeling of sequence data. We first show that u-MPS enable sequence-level parallelism, with length-n sequences able to be evaluated in depth O(log n). We then introduce a novel generative algorithm giving trained u-MPS the ability to efficiently sample from a wide variety of conditional distributions, each one defined by a regular expression. Special cases of this algorithm correspond to autoregressive and fill-in-the-blank sampling, but more complex regular expressions permit the generation of richly structured text in a manner that has no direct analogue in current generative models. Experiments on synthetic text data find u-MPS outperforming LSTM baselines in several sampling tasks, and demonstrate strong generalization in the presence of limited data.
Provably efficient reconstruction of policy networks
Recent research has shown that learning poli-cies parametrized by large neural networks can achieve significant success on challenging reinf… (see more)orcement learning problems. However, when memory is limited, it is not always possible to store such models exactly for inference, and com-pressing the policy into a compact representation might be necessary. We propose a general framework for policy representation, which reduces this problem to finding a low-dimensional embedding of a given density function in a separable inner product space. Our framework allows us to de-rive strong theoretical guarantees, controlling the error of the reconstructed policies. Such guaran-tees are typically lacking in black-box models, but are very desirable in risk-sensitive tasks. Our experimental results suggest that the reconstructed policies can use less than 10%of the number of parameters in the original networks, while incurring almost no decrease in rewards.