Portrait of Reihaneh Rabbany

Reihaneh Rabbany

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science
Research Topics
Data Mining
Graph Neural Networks
Learning on Graphs
Natural Language Processing
Representation Learning

Biography

Reihaneh Rabbany is an assistant professor at the School of Computer Science, McGill University, and a core academic member of Mila – Quebec Artificial Intelligence Institute. She is also a Canada CIFAR AI Chair and on the faculty of McGill’s Centre for the Study of Democratic Citizenship.

Before joining McGill, Rabbany was a postdoctoral fellow at the School of Computer Science, Carnegie Mellon University. She completed her PhD in the Department of Computing Science at the University of Alberta.

Rabbany heads McGill’s Complex Data Lab, where she conducts research at the intersection of network science, data mining and machine learning, with a focus on analyzing real-world interconnected data and social good applications.

Current Students

Master's Research - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
Collaborating researcher - University of Mannheim
Principal supervisor :
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
Master's Research - McGill University
Co-supervisor :
Collaborating researcher
Postdoctorate - McGill University
Collaborating researcher
Principal supervisor :
Research Intern - McGill University
Master's Research - McGill University
Master's Research - Université de Montréal
Principal supervisor :
Collaborating researcher - McGill University
PhD - McGill University
Master's Research - Université de Montréal
Principal supervisor :

Publications

Open, Closed, or Small Language Models for Text Classification?
Hao Yu
Zachary Yang
Kellin Pelrine
Recent advancements in large language models have demonstrated remarkable capabilities across various NLP tasks. But many questions remain, … (see more)including whether open-source models match closed ones, why these models excel or struggle with certain tasks, and what types of practical procedures can improve performance. We address these questions in the context of classification by evaluating three classes of models using eight datasets across three distinct tasks: named entity recognition, political party prediction, and misinformation detection. While larger LLMs often lead to improved performance, open-source models can rival their closed-source counterparts by fine-tuning. Moreover, supervised smaller models, like RoBERTa, can achieve similar or even greater performance in many datasets compared to generative LLMs. On the other hand, closed models maintain an advantage in hard tasks that demand the most generalizability. This study underscores the importance of model selection based on task requirements
ToxBuster: In-game Chat Toxicity Buster with BERT
Zachary Yang
Yasmine Maricar
M. Davari
Nicolas Grenon-Godbout
Detecting toxicity in online spaces is challenging and an ever more pressing problem given the increase in social media and gaming consumpti… (see more)on. We introduce ToxBuster, a simple and scalable model trained on a relatively large dataset of 194k lines of game chat from Rainbow Six Siege and For Honor, carefully annotated for different kinds of toxicity. Compared to the existing state-of-the-art, ToxBuster achieves 82.95% (+7) in precision and 83.56% (+57) in recall. This improvement is obtained by leveraging past chat history and metadata. We also study the implication towards real-time and post-game moderation as well as the model transferability from one game to another.
Fast and Attributed Change Detection on Dynamic Graphs with Density of States
Shenyang Huang
Jacob Danovitch
Social Media as a Vector for Escort Ads:A Study on OnlyFans advertisements on Twitter
Maricarmen Arenas
Pratheeksha Nair
Online sex trafficking is on the rise and a majority of trafficking victims report being advertised online. The use of OnlyFans as a platfor… (see more)m for adult content is also increasing, with Twitter as its main advertising tool. Furthermore, we know that traffickers usually work within a network and control multiple victims. Consequently, we suspect that there may be networks of traffickers promoting multiple OnlyFans accounts belonging to their victims. To this end, we present the first study of OnlyFans advertisements on Twitter in the context of finding organized activities. Preliminary analysis of this space shows that most tweets related to OnlyFans contain generic text, making text-based methods less reliable. Instead, focusing on what ties the authors of these tweets together, we propose a novel method for uncovering coordinated networks of users based on their behaviour. Our method, called Multi-Level Clustering (MLC), combines two levels of clustering that considers both the network structure as well as embedded node attribute information. It focuses jointly on user connections (through mentions) and content (through shared URLs). We apply MLC to real-world data of 2 million tweets pertaining to OnlyFans and analyse the detected groups. We also evaluate our method on synthetically generated data (with injected ground truth) and show its superior performance compared to competitive baselines. Finally, we discuss examples of organized clusters as case studies and provide interesting conclusions to our study.
DisKeyword: Tweet Corpora Exploration for Keyword Selection
Sacha Lévy
DeltaShield: Information Theory for Human- Trafficking Detection
Catalina Vajiac
Meng-Chieh Lee
Aayushi Kulshrestha
Sacha Lévy
Namyong Park
Andreas Olligschlaeger
Cara Jones
Christos Faloutsos
Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
Zachary Yang
Nicolas Grenon-Godbout
Real-time toxicity detection in online environments poses a significant challenge, due to the increasing prevalence of social media and gami… (see more)ng platforms. We introduce ToxBuster, a simple and scalable model that reliably detects toxic content in real-time for a line of chat by including chat history and metadata. ToxBuster consistently outperforms conventional toxicity models across popular multiplayer games, including Rainbow Six Siege, For Honor, and DOTA 2. We conduct an ablation study to assess the importance of each model component and explore ToxBuster's transferability across the datasets. Furthermore, we showcase ToxBuster's efficacy in post-game moderation, successfully flagging 82.1% of chat-reported players at a precision level of 90.0%. Additionally, we show how an additional 6% of unreported toxic players can be proactively moderated.
TrafficVis: Visualizing Organized Activity and Spatio-Temporal Patterns for Detecting and Labeling Human Trafficking
Catalina Vajiac
Duen Horng Chau
Andreas Olligschlaeger
Rebecca Mackenzie
Pratheeksha Nair
Meng-Chieh Lee
Yifei Li
Namyong Park
Christos Faloutsos
Law enforcement and domain experts can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected… (see more) ads. How can we explain clustering results intuitively and interactively, visualizing potential evidence for experts to analyze? We present TrafficVis, the first interface for cluster-level HT detection and labeling. Developed through months of participatory design with domain experts, TrafficVis provides coordinated views in conjunction with carefully chosen backend algorithms to effectively show spatio-temporal and text patterns to a wide variety of anti-HT stakeholders. We build upon state-of-the-art text clustering algorithms by incorporating shared metadata as a signal of connected and possibly suspicious activity, then visualize the results. Domain experts can use TrafficVis to label clusters as HT, or other, suspicious, but non-HT activity such as spam and scam, quickly creating labeled datasets to enable further HT research. Through domain expert feedback and a usage scenario, we demonstrate TRAFFICVIS's efficacy. The feedback was overwhelmingly positive, with repeated high praises for the usability and explainability of our tool, the latter being vital for indicting possible criminals.
Active Keyword Selection to Track Evolving Topics on Twitter
Sacha Lévy
Farimah Poursafaei
Kellin Pelrine
How can we study social interactions on evolving topics at a mass scale? Over the past decade, researchers from diverse fields such as econo… (see more)mics, political science, and public health have often done this by querying Twitter's public API endpoints with hand-picked topical keywords to search or stream discussions. However, despite the API's accessibility, it remains difficult to select and update keywords to collect high-quality data relevant to topics of interest. In this paper, we propose an active learning method for rapidly refining query keywords to increase both the yielded topic relevance and dataset size. We leverage a large open-source COVID-19 Twitter dataset to illustrate the applicability of our method in tracking Tweets around the key sub-topics of Vaccine, Mask, and Lockdown. Our experiments show that our method achieves an average topic-related keyword recall 2x higher than baselines. We open-source our code along with a web interface for keyword selection to make data collection from Twitter more systematic for researchers.
Early Detection of Sexual Predators with Federated Learning
Khaoula Chehbouni
Gilles Caporossi
Martine De Cock
The rise in screen time and the isolation brought by the different containment measures implemented during the COVID-19 pandemic have led to… (see more) an alarming increase in cases of online grooming. Online grooming is defined as all the strategies used by predators to lure children into sexual exploitation. Previous attempts made in industry and academia on the detection of grooming rely on accessing and monitoring users’ private conversations through the training of a model centrally or by sending personal conversations to a global server. We introduce a first, privacy-preserving, cross-device, federated learning framework for the early detection of sexual predators, which aims to ensure a safe online environment for children while respecting their privacy.
Revisiting Hotels-50K and Hotel-ID
Aarash Feizi
Arantxa Casanova
In this paper, we propose revisited versions for two recent hotel recognition datasets: Hotels-50K and Hotel-ID. The revisited versions prov… (see more)ide evaluation setups with different levels of difficulty to better align with the intended real-world application, i.e. countering human trafficking. Real-world scenarios involve hotels and locations that are not captured in the current data sets, therefore it is important to consider evaluation settings where classes are truly unseen. We test this setup using multiple state-of-the-art image retrieval models and show that as expected, the models’ performances decrease as the evaluation gets closer to the real-world unseen settings. The rankings of the best performing models also change across the different evaluation settings, which further motivates using the proposed revisited datasets.
VisPaD: Visualization and Pattern Discovery for Fighting Human Trafficking
Pratheeksha Nair
Yifei Li
Catalina Vajiac
Andreas Olligschlaeger
Meng-Chieh Lee
Namyong Park
Duen Horng Chau
Christos Faloutsos
Chieh Lee
Human trafficking analysts investigate groups of related online escort advertisements (called micro-clusters) to detect suspicious activitie… (see more)s and identify various modus operandi. This task is complex as it requires finding patterns and linked meta-data across micro-clusters such as the geographical spread of ads, cluster sizes, etc. Additionally, drawing insights from the data is challenging without visualizing these micro-clusters. To address this, in close-collaboration with domain experts, we built VisPaD, a novel interactive way for characterizing and visualizing micro-clusters and their associated meta-data, all in one place. VisPaD helps discover underlying patterns in the data by projecting micro-clusters in a lower dimensional space. It also allows the user to select micro-clusters involved in suspicious patterns and interactively examine them leading to faster detection and identification of trends in the data. A demo of VisPaD is also released1.