Portrait of Guillaume Rabusseau

Guillaume Rabusseau

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Deep Learning
Graph Neural Networks
Learning on Graphs
Machine Learning Theory
Probabilistic Models
Quantum Information Theory
Recommender Systems
Recurrent Neural Networks
Tensor Factorization

Biography

I have been an assistant professor at Mila – Quebec Artificial Intelligence Institute and in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal (UdeM) since September 2018. I was awarded a Canada CIFAR AI Chair in March 2019. Before joining UdeM, I was a postdoctoral research fellow in the Reasoning and Learning Lab at McGill University, where I worked with Prakash Panangaden, Joelle Pineau and Doina Precup.

I obtained my PhD in 2016 from Aix-Marseille University (AMU) in France, where I worked in the Qarma team (Machine Learning and Multimedia) under the supervision of François Denis and Hachem Kadri. I also obtained my MSc in fundamental computer science and my BSc in computer science from AMU. I am interested in tensor methods for machine learning and in designing learning algorithms for structured data by leveraging linear and multilinear algebra (e.g., spectral methods).

Current Students

Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating Alumni - McGill University
Principal supervisor :
Independent visiting researcher - Technical University of Hambrug, Germany
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Postdoctorate - McGill University
Co-supervisor :
Master's Research - Université de Montréal
Collaborating researcher - McGill University
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
Co-supervisor :
PhD - Université de Montréal

Publications

ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation
Mohammadreza Bakhtyari
Renato Cordeiro De Amorim
We introduce ClustRecNet - a novel deep learning (DL)-based recommendation framework for determining the most suitable clustering algorithms… (see more) for a given dataset, addressing the long-standing challenge of clustering algorithm selection in unsupervised learning. To enable supervised learning in this context, we construct a comprehensive data repository comprising 34,000 synthetic datasets with diverse structural properties. Each of them was processed using 10 popular clustering algorithms. The resulting clusterings were assessed via the Adjusted Rand Index (ARI) to establish ground truth labels, used for training and evaluation of our DL model. The proposed network architecture integrates convolutional, residual, and attention mechanisms to capture both local and global structural patterns from the input data. This design supports end-to-end training to learn compact representations of datasets and enables direct recommendation of the most suitable clustering algorithm, reducing reliance on handcrafted meta-features and traditional Cluster Validity Indices (CVIs). Comprehensive experiments across synthetic and real-world benchmarks demonstrate that our DL model consistently outperforms conventional CVIs (e.g. Silhouette, Calinski-Harabasz, Davies-Bouldin, and Dunn) as well as state-of-the-art AutoML clustering recommendation approaches (e.g. ML2DAC, AutoCluster, and AutoML4Clust). Notably, the proposed model achieves a 0.497 ARI improvement over the Calinski-Harabasz index on synthetic data and a 15.3% ARI gain over the best-performing AutoML approach on real-world data.
RL Fine-Tuning Heals OOD Forgetting in SFT
Hangzhan Jin
Sicheng Lyu
Mohammad Hamdaqa
The two-stage fine-tuning paradigm of Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has empirically shown better reas… (see more)oning performance than one-stage SFT for the post-training of Large Language Models (LLMs). However, the evolution and mechanism behind the synergy of SFT and RL are still under-explored and inconclusive. In our study, we find the well-known claim"SFT memorizes, RL generalizes"is over-simplified, and discover that: (1) OOD performance peaks at the early stage of SFT and then declines (OOD forgetting), the best SFT checkpoint cannot be captured by training/test loss; (2) the subsequent RL stage does not generate fundamentally better OOD capability, instead it plays an \textbf{OOD restoration} role, recovering the lost reasoning ability during SFT; (3) The recovery ability has boundaries, \ie{} \textbf{if SFT trains for too short or too long, RL cannot recover the lost OOD ability;} (4) To uncover the underlying mechanisms behind the forgetting and restoration process, we employ SVD analysis on parameter matrices, manually edit them, and observe their impacts on model performance. Unlike the common belief that the shift of model capacity mainly results from the changes of singular values, we find that they are actually quite stable throughout fine-tuning. Instead, the OOD behavior strongly correlates with the \textbf{rotation of singular vectors}. Our findings re-identify the roles of SFT and RL in the two-stage fine-tuning and discover the rotation of singular vectors as the key mechanism. %reversing the rotations induced by SFT, which shows recovery from forgetting, whereas imposing the SFT parameter directions onto a RL-tuned model results in performance degradation. Code is available at https://github.com/xiaodanguoguo/RL_Heals_SFT
RL Fine-Tuning Heals OOD Forgetting in SFT
Hangzhan Jin
Sicheng Lyu
Mohammad Hamdaqa
T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs
Dynamic graph learning methods have recently emerged as powerful tools for modelling relational data evolving through time. However, despite… (see more) extensive benchmarking efforts, it remains unclear whether current Temporal Graph Neural Networks (TGNNs) effectively capture core temporal patterns such as periodicity, cause-and-effect, and long-range dependencies. In this work, we introduce the Temporal Graph Reasoning Benchmark (T-GRAB), a comprehensive set of synthetic tasks designed to systematically probe the capabilities of TGNNs to reason across time. T-GRAB provides controlled, interpretable tasks that isolate key temporal skills: counting/memorizing periodic repetitions, inferring delayed causal effects, and capturing long-range dependencies over both spatial and temporal dimensions. We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns. Our findings offer actionable insights into the limitations of current models, highlight challenges hidden by traditional real-world benchmarks, and motivate the development of architectures with stronger temporal reasoning abilities. The code for T-GRAB can be found at: https://github.com/alirezadizaji/T-GRAB.
TGM: A Modular Framework for Machine Learning on Temporal Graphs
While deep learning on static graphs has been revolutionized by standardized libraries like PyTorch Geometric and DGL, machine learning on T… (see more)emporal Graphs (TG), networks that evolve over time, lacks comparable software infrastructure. Existing TG libraries are limited in scope, focusing on a single method category or specific algorithms. We introduce Temporal Graph Modelling (TGM), a comprehensive framework for machine learning on temporal graphs to address this gap. Through a modular architecture, TGM is the first library to support both discrete and continuous-time TG methods and implements a wide range of TG methods. The TGM framework combines an intuitive front-end API with an optimized backend storage, enabling reproducible research and efficient experimentation at scale. Key features include graph-level optimizations for offline training and built-in performance profiling capabilities. Through extensive benchmarking on five real-world networks, TGM is up to 6 times faster than the widely used DyGLib library on TGN and TGAT models and up to 8 times faster than the UTG framework for converting edges into coarse-grained snapshots.
Are Large Language Models Good Temporal Graph Learners?
Large Language Models (LLMs) have recently driven significant advancements in Natural Language Processing and various other applications. Wh… (see more)ile a broad range of literature has explored the graph-reasoning capabilities of LLMs, including their use of predictors on graphs, the application of LLMs to dynamic graphs -- real world evolving networks -- remains relatively unexplored. Recent work studies synthetic temporal graphs generated by random graph models, but applying LLMs to real-world temporal graphs remains an open question. To address this gap, we introduce Temporal Graph Talker (TGTalker), a novel temporal graph learning framework designed for LLMs. TGTalker utilizes the recency bias in temporal graphs to extract relevant structural information, converted to natural language for LLMs, while leveraging temporal neighbors as additional information for prediction. TGTalker demonstrates competitive link prediction capabilities compared to existing Temporal Graph Neural Network (TGNN) models. Across five real-world networks, TGTalker performs competitively with state-of-the-art temporal graph methods while consistently outperforming popular models such as TGN and HTGN. Furthermore, TGTalker generates textual explanations for each prediction, thus opening up exciting new directions in explainability and interpretability for temporal link prediction. The code is publicly available at https://github.com/shenyangHuang/TGTalker.
Are Large Language Models Good Temporal Graph Learners?
Large Language Models (LLMs) have recently driven significant advancements in Natural Language Processing and various other applications. Wh… (see more)ile a broad range of literature has explored the graph-reasoning capabilities of LLMs, including their use of predictors on graphs, the application of LLMs to dynamic graphs -- real world evolving networks -- remains relatively unexplored. Recent work studies synthetic temporal graphs generated by random graph models, but applying LLMs to real-world temporal graphs remains an open question. To address this gap, we introduce Temporal Graph Talker (TGTalker), a novel temporal graph learning framework designed for LLMs. TGTalker utilizes the recency bias in temporal graphs to extract relevant structural information, converted to natural language for LLMs, while leveraging temporal neighbors as additional information for prediction. TGTalker demonstrates competitive link prediction capabilities compared to existing Temporal Graph Neural Network (TGNN) models. Across five real-world networks, TGTalker performs competitively with state-of-the-art temporal graph methods while consistently outperforming popular models such as TGN and HTGN. Furthermore, TGTalker generates textual explanations for each prediction, thus opening up exciting new directions in explainability and interpretability for temporal link prediction. The code is publicly available at https://github.com/shenyangHuang/TGTalker.
Grokking Beyond the Euclidean Norm of Model Parameters
Tikeng Notsawo Pascal Junior
Pascal Notsawo
Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. I… (see more)n this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property
Grokking Beyond the Euclidean Norm of Model Parameters
Tikeng Notsawo Pascal Junior
Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. I… (see more)n this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property
Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data
In this paper, we tackle the challenge of predicting stock movements in financial markets by introducing Higher Order Transformers, a novel … (see more)architecture designed for processing multivariate time-series data. We extend the self-attention mechanism and the transformer architecture to a higher order, effectively capturing complex market dynamics across time and variables. To manage computational complexity, we propose a low-rank approximation of the potentially large attention tensor using tensor decomposition and employ kernel attention, reducing complexity to linear with respect to the data size. Additionally, we present an encoder-decoder model that integrates technical and fundamental analysis, utilizing multimodal signals from historical prices and related tweets. Our experiments on the Stocknet dataset demonstrate the effectiveness of our method, highlighting its potential for enhancing stock movement prediction in financial markets.
Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data
In this paper, we tackle the challenge of predicting stock movements in financial markets by introducing Higher Order Transformers, a novel … (see more)architecture designed for processing multivariate time-series data. We extend the self-attention mechanism and the transformer architecture to a higher order, effectively capturing complex market dynamics across time and variables. To manage computational complexity, we propose a low-rank approximation of the potentially large attention tensor using tensor decomposition and employ kernel attention, reducing complexity to linear with respect to the data size. Additionally, we present an encoder-decoder model that integrates technical and fundamental analysis, utilizing multimodal signals from historical prices and related tweets. Our experiments on the Stocknet dataset demonstrate the effectiveness of our method, highlighting its potential for enhancing stock movement prediction in financial markets.
Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data
In this paper, we tackle the challenge of predicting stock movements in financial markets by introducing Higher Order Transformers, a novel … (see more)architecture designed for processing multivariate time-series data. We extend the self-attention mechanism and the transformer architecture to a higher order, effectively capturing complex market dynamics across time and variables. To manage computational complexity, we propose a low-rank approximation of the potentially large attention tensor using tensor decomposition and employ kernel attention, reducing complexity to linear with respect to the data size. Additionally, we present an encoder-decoder model that integrates technical and fundamental analysis, utilizing multimodal signals from historical prices and related tweets. Our experiments on the Stocknet dataset demonstrate the effectiveness of our method, highlighting its potential for enhancing stock movement prediction in financial markets.