Guillaume Rabusseau

farzaneh.heidari@mila.quebec

Google Scholar

Beheshteh Toloueirakhshan

Doctorat - Université de Montréal

rakhshab@mila.quebec

Farzaneh Heidari

Doctorat - Université de Montréal

Co-superviseur⋅e :

Jian Tang

Julia Gastinger

Stagiaire de recherche - University of Mannheim

Co-superviseur⋅e :

julia.gastinger@mila.quebec

Google Scholar

Jun Dai

Postdoctorat - Université de Montréal

jun.dai@mila.quebec

Google Scholar

Marawan Gamal

Doctorat - Université de Montréal

marawan.gamal@mila.quebec

michael.rizvi-martel@mila.quebec

Maude Lizaire

Doctorat - Université de Montréal

lizairem@mila.quebec

Michael Rizvi-Martel

Maîtrise recherche - Université de Montréal

Site web

Shirzadkhani Razieh Shirzadkhani

Omar Chikar

Maîtrise recherche - Université de Montréal

omar.chikhar@mila.quebec

Collaborateur·rice de recherche

Co-superviseur⋅e :

razieh.shirzadkhani@mila.quebec

Shenyang Huang

Doctorat - McGill University

Superviseur⋅e principal⋅e :

Maîtrise recherche - McGill University

Superviseur⋅e principal⋅e :

soroush.omranpour@mila.quebec

Publications

Simulating Weighted Automata over Sequences and Trees with Transformers

Michael Rizvi

Maude Lizaire

Clara Lacroce

2024-03-12

ArXiv (prépublication)

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

Dominique Beaini

Shenyang Huang

Joao Alex Cunha

Zhiyi Li

Gabriela Moisescu-Pareja

Oleksandr Dymov

Samuel Maddrell-Mander

Callum McLean

Frederik Wenkel

Luis Müller

Jama Hussein Mohamud

Ali Parviz

Michael Craig

Michał Koziarski

Jiarui Lu

Zhaocheng Zhu

Cristian Gabellini

Kerstin Klaser

Josef Dean

Cas Wognum … (voir 15 de plus)

Maciej Sypetkowski

Jian Tang

Christopher Morris

Ioannis Koutis

Mirco Ravanelli

Guy Wolf

Prudencio Tossou

Hadrien Mary

Therence Bois

Andrew William Fitzgibbon

Blazej Banaszewski

Chad Martin

Dominic Masters

Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, wh… (voir plus)ere datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks. The Graphium library is publicly available on Github and the dataset links are available in Part 1 and Part 2.

2024-01-16

ICLR.cc/2024/Conference (poster)

Laplacian Change Point Detection for Single and Multi-view Dynamic Graphs

Shenyang Huang

Samy Coulombe

Yasmeen Hitti

Dynamic graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly dete… (voir plus)ction in temporal graphs is crucial for many real-world applications such as intrusion identification in network systems, detection of ecosystem disturbances, and detection of epidemic outbreaks. In this article, we focus on change point detection in dynamic graphs and address three main challenges associated with this problem: (i) how to compare graph snapshots across time, (ii) how to capture temporal dependencies, and (iii) how to combine different views of a temporal graph. To solve the above challenges, we first propose Laplacian Anomaly Detection (LAD) which uses the spectrum of graph Laplacian as the low dimensional embedding of the graph structure at each snapshot. LAD explicitly models short-term and long-term dependencies by applying two sliding windows. Next, we propose MultiLAD, a simple and effective generalization of LAD to multi-view graphs. MultiLAD provides the first change point detection method for multi-view dynamic graphs. It aggregates the singular values of the normalized graph Laplacian from different views through the scalar power mean operation. Through extensive synthetic experiments, we show that (i) LAD and MultiLAD are accurate and outperforms state-of-the-art baselines and their multi-view extensions by a large margin, (ii) MultiLAD’s advantage over contenders significantly increases when additional views are available, and (iii) MultiLAD is highly robust to noise from individual views. In five real-world dynamic graphs, we demonstrate that LAD and MultiLAD identify significant events as top anomalies such as the implementation of government COVID-19 interventions which impacted the population mobility in multi-view traffic networks.

2024-01-12

ACM Transactions on Knowledge Discovery from Data (publié)

Generative Learning of Continuous Data by Tensor Networks

Alex Meiburg

Jing Chen

Jacob Miller

Raphaelle Tihon

Alejandro Perdomo-ortiz

2023-10-31

ArXiv (prépublication)

Temporal Graph Benchmark for Machine Learning on Temporal Graphs

Shenyang Huang

Farimah Poursafaei

Jacob Danovitch

Matthias Fey

Weihua Hu

Emanuele Rossi

Jure Leskovec

Michael M. Bronstein

We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and r… (voir plus)obust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/.

2023-09-25

NeurIPS.cc/2023/Track/Datasets_and_Benchmarks (poster)

ROSA: Random Orthogonal Subspace Adaptation

Marawan Gamal

Aristides Milios

Siva Reddy

2023-06-20

ICML.cc/2023/Workshop/ES-FoMO (poster)

Optimal Approximate Minimization of One-Letter Weighted Finite Automata

Clara Lacroce

Borja Balle

Prakash Panangaden

2023-05-31

ArXiv (prépublication)

Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Shenyang Huang

Jacob Danovitch

2023-05-15

ArXiv (prépublication)

Recurrent Real-valued Neural Autoregressive Density Estimator for Online Density Estimation and Classification of Streaming Data

Tianyu Li

Bogdan Mazoure

In contrast with the traditional offline learning, where complete data accessibility is assumed, many modern applications involve processing… (voir plus) data in a streaming fashion. This online learning setting raises various challenges, including concept drift, hardware memory constraints, etc. In this paper, we propose the Recurrent Real-valued Neural Autoregressive Density Estimator (RRNADE), a flexible density-based model for online classification and density estimation. RRNADE combines a neural Gaussian mixture density module with a recurrent module. This combination allows RRNADE to exploit possible sequential correlations in the streaming task, which are often ignored in the classical streaming setting where each input is assumed to be independent from the previous ones. We showcase the ability of RRNADE to adapt to concept drifts on synthetic density estimation tasks. We also apply RRNADE to online classification tasks on both real world and synthetic datasets and compare it with multiple density based as well as nondensity based online classification methods. In almost all of these tasks, RRNADE outperforms the other methods. Lastly, we conduct an ablation study demonstrating the complementary benefits of the density and the recurrent modules.

2023-02-01

ICLR.cc/2023/Conference (rejected)

Benchmarking State-Merging Algorithms for Learning Regular Languages.

Adil Soubki

Jeffrey Heinz

François Coste

Faissal Ouardi

2023-01-01

International Conference on Graphics and Interaction (publié)

dblp.uni-trier.de

Explaining Graph Neural Networks Using Interpretable Local Surrogates

Farzaneh Heidari

Perouz Taslakian

We propose an interpretable local surrogate (ILS) method for understanding the predictions of black-box graph models. Explainability methods… (voir plus) are commonly employed to gain insights into black-box models and, given the widespread adoption of GNNs in diverse applications, understanding the underlying reasoning behind their decision-making processes becomes crucial. Our ILS method approximates the behavior of a black-box graph model by fitting a simple surrogate model in the local neighborhood of a given input example. Leveraging the interpretability of the surrogate, ILS is able to identify the most relevant nodes contributing to a specific prediction. To efficiently identify these nodes, we utilize group sparse linear models as local surrogates. Through empirical evaluations on explainability benchmarks, our method consistently outperforms state-of-the-art graph explainability methods. This demonstrates the effectiveness of our approach in providing enhanced interpretability for GNN predictions.

2023-01-01

TAG-ML (publié)

dblp.uni-trier.de

Formal and Empirical Studies of Counting Behaviour in ReLU RNNs.

Nadine El-Naggar

Andrew Ryzhikov

Laure Daviaud

Pranava Madhyastha

Tillman Weyde

François Coste

Faissal Ouardi