Portrait de Reihaneh Rabbany

Reihaneh Rabbany

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure adjointe, McGill University, École d'informatique
Sujets de recherche
Apprentissage de représentations
Apprentissage sur graphes
Exploration des données
Réseaux de neurones en graphes
Traitement du langage naturel

Biographie

Reihaneh Rabbany est professeure adjointe à l'École d'informatique de l'Université McGill. Elle est membre du corps professoral de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Elle est également membre du corps enseignant du Centre pour l’étude de la citoyenneté démocratique de McGill. Avant de se joindre à l’Université McGill, elle a été boursière postdoctorale à la School of Computer Science de l'Université Carnegie Mellon. Elle a obtenu un doctorat à l’Université de l’Alberta, au Département d'informatique. Elle dirige le laboratoire de données complexes, dont les recherches se situent à l'intersection de la science des réseaux, de l'exploration des données et de l'apprentissage automatique, et se concentrent sur l'analyse des données interconnectées du monde réel et sur les applications sociales.

Étudiants actuels

Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Collaborateur·rice de recherche - McGill
Collaborateur·rice de recherche - University of Mannheim
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Stagiaire de recherche - UdeM
Maîtrise recherche - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Collaborateur·rice de recherche
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - Université de Montréal
Doctorat - McGill
Stagiaire de recherche - UdeM

Publications

Active Keyword Selection to Track Evolving Topics on Twitter
Sacha Lévy
Farimah Poursafaei
Kellin Pelrine
How can we study social interactions on evolving topics at a mass scale? Over the past decade, researchers from diverse fields such as econo… (voir plus)mics, political science, and public health have often done this by querying Twitter's public API endpoints with hand-picked topical keywords to search or stream discussions. However, despite the API's accessibility, it remains difficult to select and update keywords to collect high-quality data relevant to topics of interest. In this paper, we propose an active learning method for rapidly refining query keywords to increase both the yielded topic relevance and dataset size. We leverage a large open-source COVID-19 Twitter dataset to illustrate the applicability of our method in tracking Tweets around the key sub-topics of Vaccine, Mask, and Lockdown. Our experiments show that our method achieves an average topic-related keyword recall 2x higher than baselines. We open-source our code along with a web interface for keyword selection to make data collection from Twitter more systematic for researchers.
Early Detection of Sexual Predators with Federated Learning
Khaoula Chehbouni
Gilles Caporossi
Martine De Cock
The rise in screen time and the isolation brought by the different containment measures implemented during the COVID-19 pandemic have led to… (voir plus) an alarming increase in cases of online grooming. Online grooming is defined as all the strategies used by predators to lure children into sexual exploitation. Previous attempts made in industry and academia on the detection of grooming rely on accessing and monitoring users’ private conversations through the training of a model centrally or by sending personal conversations to a global server. We introduce a first, privacy-preserving, cross-device, federated learning framework for the early detection of sexual predators, which aims to ensure a safe online environment for children while respecting their privacy.
Revisiting Hotels-50K and Hotel-ID
Aarash Feizi
Arantxa Casanova
In this paper, we propose revisited versions for two recent hotel recognition datasets: Hotels-50K and Hotel-ID. The revisited versions prov… (voir plus)ide evaluation setups with different levels of difficulty to better align with the intended real-world application, i.e. countering human trafficking. Real-world scenarios involve hotels and locations that are not captured in the current data sets, therefore it is important to consider evaluation settings where classes are truly unseen. We test this setup using multiple state-of-the-art image retrieval models and show that as expected, the models’ performances decrease as the evaluation gets closer to the real-world unseen settings. The rankings of the best performing models also change across the different evaluation settings, which further motivates using the proposed revisited datasets.
VisPaD: Visualization and Pattern Discovery for Fighting Human Trafficking
Pratheeksha Nair
Yifei Li
Catalina Vajiac
Andreas Olligschlaeger
Meng-Chieh Lee
Namyong Park
Duen Horng Chau
Christos Faloutsos
Chieh Lee
Human trafficking analysts investigate groups of related online escort advertisements (called micro-clusters) to detect suspicious activitie… (voir plus)s and identify various modus operandi. This task is complex as it requires finding patterns and linked meta-data across micro-clusters such as the geographical spread of ads, cluster sizes, etc. Additionally, drawing insights from the data is challenging without visualizing these micro-clusters. To address this, in close-collaboration with domain experts, we built VisPaD, a novel interactive way for characterizing and visualizing micro-clusters and their associated meta-data, all in one place. VisPaD helps discover underlying patterns in the data by projecting micro-clusters in a lower dimensional space. It also allows the user to select micro-clusters involved in suspicious patterns and interactively examine them leading to faster detection and identification of trends in the data. A demo of VisPaD is also released1.
VisPaD: Visualization and Pattern Discovery for Fighting Human Trafficking
Pratheeksha Nair
Yifei Li
Catalina Vajiac
Andreas Olligschlaeger
Meng-Chieh Lee
Namyong Park
Duen Horng Chau
Christos Faloutsos
Chieh Lee
A Strong Node Classification Baseline for Temporal Graphs
Farimah Poursafaei
Željko Žilić
Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking
Yifei Li
Pratheeksha Nair
Kellin Pelrine
Online escort advertisement websites are widely used for advertising victims of human trafficking. Domain experts agree that advertising mul… (voir plus)tiple people in the same ad is a strong indicator of trafficking. Thus, extracting person names from the text of these ads can provide valuable clues for further analysis. However, Named-Entity Recognition (NER) on escort ads is challenging because the text can be noisy, colloquial and often lacking proper grammar and punctuation. Most existing state-of-the-art NER models fail to demonstrate satisfactory performance in this task. In this paper, we propose NEAT (Name Extraction Against Trafficking) for extracting person names. It effectively combines classic rule-based and dictionary extractors with a contextualized language model to capture ambiguous names (e.g penny, hazel) and adapts to adversarial changes in the text by expanding its dictionary. NEAT shows 19% improvement on average in the F1 classification score for name extraction compared to previous state-of-the-art in two domain-specific datasets.
Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking
Yifei Li
Pratheeksha Nair
Kellin Pelrine
Towards Better Evaluation for Dynamic Link Prediction
Farimah Poursafaei
Andy Huang
Shenyang Huang
Kellin Pelrine
Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In th… (voir plus)is work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization-based baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings which highlights that the negative edges used in the current evaluation are easy. To sample more challenging negative edges, we introduce two novel negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.
Curating the Twitter Election Integrity Datasets for Better Online Troll Characterization
Albert Manuel Orozco Camacho
In modern days, social media platforms provide accessible channels for the inter-action and immediate reflection of the most important event… (voir plus)s happening around the world. In this paper, we, firstly, present a curated set of datasets whose origin stem from the Twitter’s Information Operations efforts. More notably, these accounts, which have been already suspended, provide a notion of how state-backed human trolls operate.Secondly, we present detailed analyses of how these behaviours vary over time,and motivate its use and abstraction in the context of deep representation learning:for instance, to learn and, potentially track, troll behaviour. We present baselinesf or such tasks and highlight the differences there may exist within the literature.Finally, we utilize the representations learned for behaviour prediction to classify trolls from"real"users, using a sample of non-suspended active accounts.
Online Partisan Polarization of COVID-19
Zachary Yang
Anne Imouza
Kellin Pelrine
Sacha Lévy
Jiewen Liu
Gabrielle Desrosiers-Brisebois
Jean-François Godbout
André Blais
In today’s age of (mis)information, many people utilize various social media platforms in an attempt to shape public opinion on several im… (voir plus)portant issues, including elections and the COVID-19 pandemic. These two topics have recently become intertwined given the importance of complying with public health measures related to COVID-19 and politicians’ management of the pandemic. Motivated by this, we study the partisan polarization of COVID-19 discussions on social media. We propose and utilize a novel measure of partisan polarization to analyze more than 380 million posts from Twitter and Parler around the 2020 US presidential election. We find strong correlation between peaks in polarization and polarizing events, such as the January 6th Capitol Hill riot. We further classify each post into key COVID-19 issues of lockdown, masks, vaccines, as well as miscellaneous, to investigate both the volume and polarization on these topics and how they vary through time. Parler includes more negative discussions around lockdown and masks, as expected, but not much around vaccines. We also observe more balanced discussions on Twitter and a general disconnect between the discussions on Parler and Twitter.
Incorporating dynamic flight network in SEIR to model mobility between populations
Xiaoye Ding
Shenyang Huang
Abby Leung