Portrait of Reihaneh Rabbany

Reihaneh Rabbany

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science
Research Topics
Data Mining
Graph Neural Networks
Learning on Graphs
Natural Language Processing
Representation Learning

Biography

Reihaneh Rabbany is an assistant professor at the School of Computer Science, McGill University, and a core academic member of Mila – Quebec Artificial Intelligence Institute. She is also a Canada CIFAR AI Chair and on the faculty of McGill’s Centre for the Study of Democratic Citizenship.

Before joining McGill, Rabbany was a postdoctoral fellow at the School of Computer Science, Carnegie Mellon University. She completed her PhD in the Department of Computing Science at the University of Alberta.

Rabbany heads McGill’s Complex Data Lab, where she conducts research at the intersection of network science, data mining and machine learning, with a focus on analyzing real-world interconnected data and social good applications.

Current Students

Collaborating researcher - Concordia University
Master's Research - McGill University
Master's Research - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
Collaborating Alumni - McGill University
Co-supervisor :
Research Intern - McGill University
Postdoctorate - McGill University
Principal supervisor :
Master's Research - McGill University
Co-supervisor :
Collaborating researcher - McGill University
Master's Research - McGill University
Collaborating researcher - McGill University
Co-supervisor :
Collaborating Alumni - McGill University
Collaborating researcher
Collaborating researcher - McGill University University
Collaborating researcher - McGill University
Master's Research - Université de Montréal
Principal supervisor :
Collaborating researcher - McGill University
Collaborating researcher - Université de Montréal
Principal supervisor :
PhD - McGill University
Research Intern - McGill University
Master's Research - Université de Montréal
Principal supervisor :

Publications

CrediBench: Building Web-Scale Network Datasets for Information Integrity
Online misinformation poses an escalating threat, amplified by the Internet's open nature and increasingly capable LLMs that generate persua… (see more)sive yet deceptive content. Existing misinformation detection methods typically focus on either textual content or network structure in isolation, failing to leverage the rich, dynamic interplay between website content and hyperlink relationships that characterizes real-world misinformation ecosystems. We introduce CrediBench: a large-scale data processing pipeline for constructing temporal web graphs that jointly model textual content and hyperlink structure for misinformation detection. Unlike prior work, our approach captures the dynamic evolution of general misinformation domains, including changes in both content and inter-site references over time. Our processed one-month snapshot extracted from the Common Crawl archive in December 2024 contains 45 million nodes and 1 billion edges, representing the largest web graph dataset made publicly available for misinformation research to date. From our experiments on this graph snapshot, we demonstrate the strength of both structural and webpage content signals for learning credibility scores, which measure source reliability. The pipeline and experimentation code are all available here, and the dataset is in this folder.
OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection
Deepfakes, synthetic media created using advanced AI techniques, have intensified the spread of misinformation, particularly in politically … (see more)sensitive contexts. Existing deepfake detection datasets are often limited, relying on outdated generation methods, low realism, or single-face imagery, restricting the effectiveness for general synthetic image detection. By analyzing social media posts, we identify multiple modalities through which deepfakes propagate misinformation. Furthermore, our human perception study demonstrates that recently developed proprietary models produce synthetic images increasingly indistinguishable from real ones, complicating accurate identification by the general public. Consequently, we present a comprehensive, politically-focused dataset specifically crafted for benchmarking detection against modern generative models. This dataset contains three million real images paired with descriptive captions, which are used for generating 963k corresponding high-quality synthetic images from a mix of proprietary and open-source models. Recognizing the continual evolution of generative techniques, we introduce an innovative crowdsourced adversarial platform, where participants are incentivized to generate and submit challenging synthetic images. This ongoing community-driven initiative ensures that deepfake detection methods remain robust and adaptive, proactively safeguarding public discourse from sophisticated misinformation threats.
RL Fine-Tuning Heals OOD Forgetting in SFT
Hangzhan Jin
Sicheng Lyu
Mohammad Hamdaqa
The two-stage fine-tuning paradigm of Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has empirically shown better reas… (see more)oning performance than one-stage SFT for the post-training of Large Language Models (LLMs). However, the evolution and mechanism behind the synergy of SFT and RL are still under-explored and inconclusive. In our study, we find the well-known claim"SFT memorizes, RL generalizes"is over-simplified, and discover that: (1) OOD performance peaks at the early stage of SFT and then declines (OOD forgetting), the best SFT checkpoint cannot be captured by training/test loss; (2) the subsequent RL stage does not generate fundamentally better OOD capability, instead it plays an \textbf{OOD restoration} role, recovering the lost reasoning ability during SFT; (3) The recovery ability has boundaries, \ie{} \textbf{if SFT trains for too short or too long, RL cannot recover the lost OOD ability;} (4) To uncover the underlying mechanisms behind the forgetting and restoration process, we employ SVD analysis on parameter matrices, manually edit them, and observe their impacts on model performance. Unlike the common belief that the shift of model capacity mainly results from the changes of singular values, we find that they are actually quite stable throughout fine-tuning. Instead, the OOD behavior strongly correlates with the \textbf{rotation of singular vectors}. Our findings re-identify the roles of SFT and RL in the two-stage fine-tuning and discover the rotation of singular vectors as the key mechanism. %reversing the rotations induced by SFT, which shows recovery from forgetting, whereas imposing the SFT parameter directions onto a RL-tuned model results in performance degradation. Code is available at https://github.com/xiaodanguoguo/RL_Heals_SFT
RL Fine-Tuning Heals OOD Forgetting in SFT
Hangzhan Jin
Sicheng Lyu
Mohammad Hamdaqa
A Guide to Misinformation Detection Data and Evaluation
Gabrielle Péloquin-Skulski
James Zhou
Florence Laflamme
Luke Yuxiang Guan
Kellin Pelrine
A Guide to Misinformation Detection Data and Evaluation
Gabrielle Péloquin-Skulski
James Zhou
Florence Laflamme
Yuxiang Guan
Kellin Pelrine
Misinformation is a complex societal issue, and mitigating solutions are difficult to create due to data deficiencies. To address this probl… (see more)em, we have curated the largest collection of (mis)information datasets in the literature, totaling 75. From these, we evaluated the quality of all of the 36 datasets that consist of statements or claims, as well as the 9 datasets that consists of data in purely paragraph form. We assess these datasets to identify those with solid foundations for empirical work and those with flaws that could result in misleading and non-generalizable results, such as insufficient label quality, spurious correlations. We further provide state-of-the-art baselines on all these datasets, but show that regardless of label quality, categorical labels may no longer give an accurate evaluation of detection model performance. We discuss alternatives to mitigate this problem. Overall, this guide aims to provide a roadmap for obtaining higher quality data and conducting more effective evaluations, ultimately improving research in misinformation detection. All datasets and other artifacts are available at [anonymized].
Responsible AI Day
Ebrahim Bagheri
Faezeh Ensan
Calvin Hillis
Robin Cohen
Sébastien Gambs
Responsible AI Day
Ebrahim Bagheri
Faezeh Ensan
Calvin Hillis
Robin Cohen
Sébastien Gambs
Temporal Graph Learning Workshop
Daniele Zambon
Andrea Cini
Julia Gastinger
Michael Bronstein
Temporal Graph Learning Workshop
Daniele Zambon
Andrea Cini
Julia Gastinger
Micheal Bronstein
Temporal Graph Learning Workshop
Daniele Zambon
Andrea Cini
Julia Gastinger
Micheal Bronstein
Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training