Portrait de Reihaneh Rabbany

Reihaneh Rabbany

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure adjointe, McGill University, École d'informatique

Biographie

Reihaneh Rabbany est professeure adjointe à l'École d'informatique de l'Université McGill. Elle est membre du corps professoral de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Elle est également membre du corps enseignant du Centre pour l’étude de la citoyenneté démocratique de McGill. Avant de se joindre à l’Université McGill, elle a été boursière postdoctorale à la School of Computer Science de l'Université Carnegie Mellon. Elle a obtenu un doctorat à l’Université de l’Alberta, au Département d'informatique. Elle dirige le laboratoire de données complexes, dont les recherches se situent à l'intersection de la science des réseaux, de l'exploration des données et de l'apprentissage automatique, et se concentrent sur l'analyse des données interconnectées du monde réel et sur les applications sociales.

Étudiants actuels

Doctorat - McGill University
Co-superviseur⋅e :
Stagiaire de recherche - Université de Montréal
Maîtrise recherche - McGill University
Stagiaire de recherche - Université de Montréal
Collaborateur·rice de recherche - McGill University
Collaborateur·rice de recherche - McGill University
Collaborateur·rice alumni - University of Mannheim
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Graduated from McGill University
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill University
Collaborateur·rice de recherche - McGill University
Co-superviseur⋅e :
Doctorat - McGill University
Co-superviseur⋅e :
Collaborateur·rice de recherche - McGill University
Maîtrise recherche - McGill University
Co-superviseur⋅e :
Stagiaire de recherche - Université de Montréal
Maîtrise recherche - McGill University

Publications

Temporal Graph Benchmark for Machine Learning on Temporal Graphs
Shenyang Huang
Farimah Poursafaei
Jacob Danovitch
Matthias Fey
Weihua Hu
Emanuele Rossi
Jure Leskovec
Michael M. Bronstein
We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and r… (voir plus)obust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/.
Party Prediction for Twitter
Kellin Pelrine
Anne Imouza
Zachary Yang
Jacob-Junqi Tian
Sacha Lévy
Gabrielle Desrosiers-Brisebois
Aarash Feizi
C'ecile Amadoro
André Blais
Jean-François Godbout
Open, Closed, or Small Language Models for Text Classification?
Hao Yu
Zachary Yang
Kellin Pelrine
Jean-François Godbout
Recent advancements in large language models have demonstrated remarkable capabilities across various NLP tasks. But many questions remain, … (voir plus)including whether open-source models match closed ones, why these models excel or struggle with certain tasks, and what types of practical procedures can improve performance. We address these questions in the context of classification by evaluating three classes of models using eight datasets across three distinct tasks: named entity recognition, political party prediction, and misinformation detection. While larger LLMs often lead to improved performance, open-source models can rival their closed-source counterparts by fine-tuning. Moreover, supervised smaller models, like RoBERTa, can achieve similar or even greater performance in many datasets compared to generative LLMs. On the other hand, closed models maintain an advantage in hard tasks that demand the most generalizability. This study underscores the importance of model selection based on task requirements
ToxBuster: In-game Chat Toxicity Buster with BERT
Zachary Yang
Yasmine Maricar
M. Davari
Nicolas Grenon-Godbout
Detecting toxicity in online spaces is challenging and an ever more pressing problem given the increase in social media and gaming consumpti… (voir plus)on. We introduce ToxBuster, a simple and scalable model trained on a relatively large dataset of 194k lines of game chat from Rainbow Six Siege and For Honor, carefully annotated for different kinds of toxicity. Compared to the existing state-of-the-art, ToxBuster achieves 82.95% (+7) in precision and 83.56% (+57) in recall. This improvement is obtained by leveraging past chat history and metadata. We also study the implication towards real-time and post-game moderation as well as the model transferability from one game to another.
Fast and Attributed Change Detection on Dynamic Graphs with Density of States
Shenyang Huang
Jacob Danovitch
Social Media as a Vector for Escort Ads:A Study on OnlyFans advertisements on Twitter
Maricarmen Arenas
Pratheeksha Nair
Online sex trafficking is on the rise and a majority of trafficking victims report being advertised online. The use of OnlyFans as a platfor… (voir plus)m for adult content is also increasing, with Twitter as its main advertising tool. Furthermore, we know that traffickers usually work within a network and control multiple victims. Consequently, we suspect that there may be networks of traffickers promoting multiple OnlyFans accounts belonging to their victims. To this end, we present the first study of OnlyFans advertisements on Twitter in the context of finding organized activities. Preliminary analysis of this space shows that most tweets related to OnlyFans contain generic text, making text-based methods less reliable. Instead, focusing on what ties the authors of these tweets together, we propose a novel method for uncovering coordinated networks of users based on their behaviour. Our method, called Multi-Level Clustering (MLC), combines two levels of clustering that considers both the network structure as well as embedded node attribute information. It focuses jointly on user connections (through mentions) and content (through shared URLs). We apply MLC to real-world data of 2 million tweets pertaining to OnlyFans and analyse the detected groups. We also evaluate our method on synthetically generated data (with injected ground truth) and show its superior performance compared to competitive baselines. Finally, we discuss examples of organized clusters as case studies and provide interesting conclusions to our study.
DisKeyword: Tweet Corpora Exploration for Keyword Selection
Sacha Lévy
DeltaShield: Information Theory for Human- Trafficking Detection
Catalina Vajiac
Meng-Chieh Lee
Aayushi Kulshrestha
Sacha Lévy
Namyong Park
Andreas Olligschlaeger
Cara Jones
Christos Faloutsos
Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
Zachary Yang
Nicolas Grenon-Godbout
Real-time toxicity detection in online environments poses a significant challenge, due to the increasing prevalence of social media and gami… (voir plus)ng platforms. We introduce ToxBuster, a simple and scalable model that reliably detects toxic content in real-time for a line of chat by including chat history and metadata. ToxBuster consistently outperforms conventional toxicity models across popular multiplayer games, including Rainbow Six Siege, For Honor, and DOTA 2. We conduct an ablation study to assess the importance of each model component and explore ToxBuster's transferability across the datasets. Furthermore, we showcase ToxBuster's efficacy in post-game moderation, successfully flagging 82.1% of chat-reported players at a precision level of 90.0%. Additionally, we show how an additional 6% of unreported toxic players can be proactively moderated.
TrafficVis: Visualizing Organized Activity and Spatio-Temporal Patterns for Detecting and Labeling Human Trafficking
Catalina Vajiac
Duen Horng Chau
Andreas Olligschlaeger
Rebecca Mackenzie
Pratheeksha Nair
Meng-Chieh Lee
Yifei Li
Namyong Park
Christos Faloutsos
Law enforcement and domain experts can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected… (voir plus) ads. How can we explain clustering results intuitively and interactively, visualizing potential evidence for experts to analyze? We present TrafficVis, the first interface for cluster-level HT detection and labeling. Developed through months of participatory design with domain experts, TrafficVis provides coordinated views in conjunction with carefully chosen backend algorithms to effectively show spatio-temporal and text patterns to a wide variety of anti-HT stakeholders. We build upon state-of-the-art text clustering algorithms by incorporating shared metadata as a signal of connected and possibly suspicious activity, then visualize the results. Domain experts can use TrafficVis to label clusters as HT, or other, suspicious, but non-HT activity such as spam and scam, quickly creating labeled datasets to enable further HT research. Through domain expert feedback and a usage scenario, we demonstrate TRAFFICVIS's efficacy. The feedback was overwhelmingly positive, with repeated high praises for the usability and explainability of our tool, the latter being vital for indicting possible criminals.
Active Keyword Selection to Track Evolving Topics on Twitter
Sacha Lévy
Farimah Poursafaei
Kellin Pelrine
How can we study social interactions on evolving topics at a mass scale? Over the past decade, researchers from diverse fields such as econo… (voir plus)mics, political science, and public health have often done this by querying Twitter's public API endpoints with hand-picked topical keywords to search or stream discussions. However, despite the API's accessibility, it remains difficult to select and update keywords to collect high-quality data relevant to topics of interest. In this paper, we propose an active learning method for rapidly refining query keywords to increase both the yielded topic relevance and dataset size. We leverage a large open-source COVID-19 Twitter dataset to illustrate the applicability of our method in tracking Tweets around the key sub-topics of Vaccine, Mask, and Lockdown. Our experiments show that our method achieves an average topic-related keyword recall 2x higher than baselines. We open-source our code along with a web interface for keyword selection to make data collection from Twitter more systematic for researchers.
Early Detection of Sexual Predators with Federated Learning
Khaoula Chehbouni
Gilles Caporossi
Martine De Cock
The rise in screen time and the isolation brought by the different containment measures implemented during the COVID-19 pandemic have led to… (voir plus) an alarming increase in cases of online grooming. Online grooming is defined as all the strategies used by predators to lure children into sexual exploitation. Previous attempts made in industry and academia on the detection of grooming rely on accessing and monitoring users’ private conversations through the training of a model centrally or by sending personal conversations to a global server. We introduce a first, privacy-preserving, cross-device, federated learning framework for the early detection of sexual predators, which aims to ensure a safe online environment for children while respecting their privacy.