Reihaneh Rabbany

Biographie

Reihaneh Rabbany est professeure adjointe à l'École d'informatique de l'Université McGill. Elle est membre du corps professoral de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Elle est également membre du corps enseignant du Centre pour l’étude de la citoyenneté démocratique de McGill. Avant de se joindre à l’Université McGill, elle a été boursière postdoctorale à la School of Computer Science de l'Université Carnegie Mellon. Elle a obtenu un doctorat à l’Université de l’Alberta, au Département d'informatique. Elle dirige le laboratoire de données complexes, dont les recherches se situent à l'intersection de la science des réseaux, de l'exploration des données et de l'apprentissage automatique, et se concentrent sur l'analyse des données interconnectées du monde réel et sur les applications sociales.

Étudiants actuels

Jacob Chmura

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Adriana Romero Soriano

Collaborateur·rice de recherche - McGill

Julia Gastinger

Collaborateur·rice de recherche - University of Mannheim

Superviseur⋅e principal⋅e :

Jean-François Godbout

Shenyang Huang

Doctorat - McGill

Co-superviseur⋅e :

Maîtrise recherche - McGill

Florence Laflamme

Stagiaire de recherche - UdeM

Doctorat - McGill

Shahrad Mohammadzadeh

Maîtrise recherche - McGill

Co-superviseur⋅e :

Doctorat - McGill

Maîtrise recherche - McGill

Soroush Omranpour

Maîtrise recherche - McGill

Co-superviseur⋅e :

kellin.pelrine@mila.quebec

Kellin Pelrine

Doctorat - McGill

Farimah Poursafaei

Postdoctorat - McGill

Maximilian Puelma Touzel

Collaborateur·rice de recherche

Razieh Razieh Shirzadkhani

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Alexandre St-Aubin

Stagiaire de recherche - McGill

Vidya Sujaya

Maîtrise recherche - McGill

Camille Thibault

Stagiaire de recherche - Université de Montréal

jacob.mila.handle@tianshome.com

Jacob-Junqi Tian

Collaborateur·rice de recherche - McGill

Doctorat - McGill

Sveta Zhuk

Stagiaire de recherche - UdeM

Billets de blogue

Flight-SEIR: Incorporating Flight Data to Improve Epidemiological Modelling and Disease Outbreak Prevention

3 août 2021

Flight-SEIR : incorporer les données de vol pour améliorer la modélisation épidémiologique et la prévention d’éclosions de maladies infectieuses

par

Shenyang Huang

Reihaneh Rabbany

Lire l'article

Publications

ToxiSight: Insights Towards Detected Chat Toxicity

Zachary Yang

Domenico Tullo

We present a comprehensive explainability dashboard designed for in-game chat toxicity. This dashboard integrates various existing explainab… (voir plus)le AI (XAI) techniques, including token importance analysis, model output visualization, and attribution to the training dataset. It also provides insights through the closest positive and negative examples, facilitating a deeper understanding and potential correction of the training data. Additionally, the dashboard includes word sense analysis—particularly useful for new moderators—and offers free-text explanations for both positive and negative predictions. This multi-faceted approach enhances the interpretability and transparency of toxicity detection models.

2024-09-21

EMNLP/2024/Workshop/BlackBoxNLP (accepté)

openreview.net

UTG: Towards a Unified View of Snapshot and Event Based Models for Temporal Graphs

Shenyang Huang

Farimah Poursafaei

Emanuele Rossi

2024-07-17

ArXiv (prépublication)

Web Retrieval Agents for Evidence-Based Misinformation Detection

Jacob-Junqi Tian

Hao Yu

Yury Orlovskiy

Tyler Vergho

Mauricio Rivera

Mayank Goel

Zachary Yang

Jean-François Godbout

Kellin Pelrine

2024-07-10

colmweb.org/COLM/2024/Conference (accepté)

openreview.net

Game On, Hate Off: A Study of Toxicity in Online Multiplayer Environments

Zachary Yang

Nicolas Grenon-Godbout

2024-06-28

Games: Research and Practice (publié)

TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs

Julia Gastinger

Shenyang Huang

Mikhail Galkin

Erfan Loghmani

Ali Parviz

Farimah Poursafaei

Jacob Danovitch

Emanuele Rossi

Ioannis Koutis

Heiner Stuckenschmidt

2024-06-14

ArXiv (prépublication)

Towards Neural Scaling Laws for Foundation Models on Temporal Graphs

Razieh Shirzadkhani

Tran Gia Bao Ngo

Kiarash Shamsi

Shenyang Huang

Farimah Poursafaei

Poupak Azad

Baris Coskunuzer

Cuneyt Gurcan Akcora

The field of temporal graph learning aims to learn from evolving network data to forecast future interactions. Given a collection of observe… (voir plus)d temporal graphs, is it possible to predict the evolution of an unseen network from the same domain? To answer this question, we first present the Temporal Graph Scaling (TGS) dataset, a large collection of temporal graphs consisting of eighty-four ERC20 token transaction networks collected from 2017 to 2023. Next, we evaluate the transferability of Temporal Graph Neural Networks (TGNNs) for the temporal graph property prediction task by pre-training on a collection of up to sixty-four token transaction networks and then evaluating the downstream performance on twenty unseen token networks. We find that the neural scaling law observed in NLP and Computer Vision also applies in temporal graph learning, where pre-training on greater number of networks leads to improved downstream performance. To the best of our knowledge, this is the first empirical demonstration of the transferability of temporal graphs learning. On downstream token networks, the largest pre-trained model outperforms single model TGNNs on thirteen unseen test networks. Therefore, we believe that this is a promising first step towards building foundation models for temporal graphs.

2024-06-14

ArXiv (prépublication)

Static graph approximations of dynamic contact networks for epidemic forecasting

Razieh Shirzadkhani

Shenyang Huang

Abby Leung

2024-05-22

Scientific Reports (publié)

T-NET: Weakly Supervised Graph Learning for Combatting Human Trafficking

Pratheeksha Nair

Javin Liu

Catalina Vajiac

Andreas Olligschlaeger

Duen Horng Chau

Mirela T. Cazzolato

Cara Jones

Christos Faloutsos

Human trafficking (HT) for forced sexual exploitation, often described as modern-day slavery, is a pervasive problem that affects millions o… (voir plus)f people worldwide. Perpetrators of this crime post advertisements (ads) on behalf of their victims on adult service websites (ASW). These websites typically contain hundreds of thousands of ads including those posted by independent escorts, massage parlor agencies and spammers (fake ads). Detecting suspicious activity in these ads is difficult and developing data-driven methods is challenging due to the hard-to-label, complex and sensitive nature of the data. In this paper, we propose T-Net, which unlike previous solutions, formulates this problem as weakly supervised classification. Since it takes several months to years to investigate a case and obtain a single definitive label, we design domain-specific signals or indicators that provide weak labels. T-Net also looks into connections between ads and models the problem as a graph learning task instead of classifying ads independently. We show that T-Net outperforms all baselines on a real-world dataset of ads by 7% average weighted F1 score. Given that this data contains personally identifiable information, we also present a realistic data generator and provide the first publicly available dataset in this domain which may be leveraged by the wider research community.

2024-03-24

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

Dominique Beaini

Shenyang Huang

Joao Alex Cunha

Zhiyi Li

Gabriela Moisescu-Pareja

Oleksandr Dymov

Samuel Maddrell-Mander

Callum McLean

Frederik Wenkel

Luis Müller

Jama Hussein Mohamud

Ali Parviz

Michael Craig

Michał Koziarski

Jiarui Lu

Zhaocheng Zhu

Cristian Gabellini

Kerstin Klaser

Josef Dean

Cas Wognum … (voir 15 de plus)

Maciej Sypetkowski

Jian Tang

Christopher Morris

Ioannis Koutis

Mirco Ravanelli

Guy Wolf

Prudencio Tossou

Hadrien Mary

Therence Bois

Andrew William Fitzgibbon

Blazej Banaszewski

Chad Martin

Dominic Masters

Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, wh… (voir plus)ere datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks. The Graphium library is publicly available on Github and the dataset links are available in Part 1 and Part 2.

2024-01-16

ICLR.cc/2024/Conference (poster)

openreview.net

Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation

Mauricio Rivera

Jean-François Godbout

Kellin Pelrine

2024-01-13

ArXiv (prépublication)

Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation

Tyler Vergho

Jean-François Godbout

Kellin Pelrine

Recent large language models (LLMs) have been shown to be effective for misinformation detection. However, the choice of LLMs for experiment… (voir plus)s varies widely, leading to uncertain conclusions. In particular, GPT-4 is known to be strong in this domain, but it is closed source, potentially expensive, and can show instability between different versions. Meanwhile, alternative LLMs have given mixed results. In this work, we show that Zephyr-7b presents a consistently viable alternative, overcoming key limitations of commonly used approaches like Llama-2 and GPT-3.5. This provides the research community with a solid open-source option and shows open-source models are gradually catching up on this task. We then highlight how GPT-3.5 exhibits unstable performance, such that this very widely used model could provide misleading results in misinformation detection. Finally, we validate new tools including approaches to structured output and the latest version of GPT-4 (Turbo), showing they do not compromise performance, thus unlocking them for future research and potentially enabling more complex pipelines for misinformation mitigation.

2024-01-12

ArXiv (prépublication)

Laplacian Change Point Detection for Single and Multi-view Dynamic Graphs

Shenyang Huang

Samy Coulombe

Yasmeen Hitti

Dynamic graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly dete… (voir plus)ction in temporal graphs is crucial for many real-world applications such as intrusion identification in network systems, detection of ecosystem disturbances, and detection of epidemic outbreaks. In this article, we focus on change point detection in dynamic graphs and address three main challenges associated with this problem: (i) how to compare graph snapshots across time, (ii) how to capture temporal dependencies, and (iii) how to combine different views of a temporal graph. To solve the above challenges, we first propose Laplacian Anomaly Detection (LAD) which uses the spectrum of graph Laplacian as the low dimensional embedding of the graph structure at each snapshot. LAD explicitly models short-term and long-term dependencies by applying two sliding windows. Next, we propose MultiLAD, a simple and effective generalization of LAD to multi-view graphs. MultiLAD provides the first change point detection method for multi-view dynamic graphs. It aggregates the singular values of the normalized graph Laplacian from different views through the scalar power mean operation. Through extensive synthetic experiments, we show that (i) LAD and MultiLAD are accurate and outperforms state-of-the-art baselines and their multi-view extensions by a large margin, (ii) MultiLAD’s advantage over contenders significantly increases when additional views are available, and (iii) MultiLAD is highly robust to noise from individual views. In five real-world dynamic graphs, we demonstrate that LAD and MultiLAD identify significant events as top anomalies such as the implementation of government COVID-19 interventions which impacted the population mobility in multi-view traffic networks.

2024-01-12

ACM Transactions on Knowledge Discovery from Data (publié)