Portrait de Reihaneh Rabbany

Reihaneh Rabbany

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure adjointe, McGill University, École d'informatique
Sujets de recherche
Apprentissage de représentations
Apprentissage sur graphes
Exploration des données
Réseaux de neurones en graphes
Traitement du langage naturel

Biographie

Reihaneh Rabbany est professeure adjointe à l'École d'informatique de l'Université McGill. Elle est membre du corps professoral de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Elle est également membre du corps enseignant du Centre pour l’étude de la citoyenneté démocratique de McGill. Avant de se joindre à l’Université McGill, elle a été boursière postdoctorale à la School of Computer Science de l'Université Carnegie Mellon. Elle a obtenu un doctorat à l’Université de l’Alberta, au Département d'informatique. Elle dirige le laboratoire de données complexes, dont les recherches se situent à l'intersection de la science des réseaux, de l'exploration des données et de l'apprentissage automatique, et se concentrent sur l'analyse des données interconnectées du monde réel et sur les applications sociales.

Étudiants actuels

Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Collaborateur·rice de recherche - McGill
Collaborateur·rice de recherche - University of Mannheim
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Stagiaire de recherche - UdeM
Maîtrise recherche - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Collaborateur·rice de recherche
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - Université de Montréal
Doctorat - McGill
Stagiaire de recherche - UdeM

Publications

The Surprising Performance of Simple Baselines for Misinformation Detection
Kellin Pelrine
Jacob Danovitch
As social media becomes increasingly prominent in our day to day lives, it is increasingly important to detect informative content and preve… (voir plus)nt the spread of disinformation and unverified rumours. While many sophisticated and successful models have been proposed in the literature, they are often compared with older NLP baselines such as SVMs, CNNs, and LSTMs. In this paper, we examine the performance of a broad set of modern transformer-based language models and show that with basic fine-tuning, these models are competitive with and can even significantly outperform recently proposed state-of-the-art methods. We present our framework as a baseline for creating and evaluating new methods for misinformation detection. We further study a comprehensive set of benchmark datasets, and discuss potential data leakage and the need for careful design of the experiments and understanding of datasets to account for confounding variables. As an extreme case example, we show that classifying only based on the first three digits of tweet ids, which contain information on the date, gives state-of-the-art performance on a commonly used benchmark dataset for fake news detection –Twitter16. We provide a simple tool to detect this problem and suggest steps to mitigate it in future datasets.
Graph Attention Networks with Positional Embeddings
SigTran: Signature Vectors for Detecting Illicit Activities in Blockchain Transaction Networks
Farimah Poursafaei
Željko Žilić
INFOSHIELD: Generalizable Information-Theoretic Human-Trafficking Detection
Meng-Chieh Lee
Catalina Vajiac
Aayushi Kulshrestha
Sacha Lévy
Namyong Park
Cara Jones
Christos Faloutsos
Given a million escort advertisements, how can we spot near-duplicates? Such micro-clusters of ads are usually signals of human trafficking.… (voir plus) How can we summarize them, visually, to convince law enforcement to act? Can we build a general tool that works for different languages? Spotting micro-clusters of near-duplicate documents is useful in multiple, additional settings, including spam-bot detection in Twitter ads, plagiarism, and more.We present INFOSHIELD, which makes the following contributions: (a) Practical, being scalable and effective on real data, (b) Parameter-free and Principled, requiring no user-defined parameters, (c) Interpretable, finding a document to be the cluster representative, highlighting all the common phrases, and automatically detecting "slots", i.e. phrases that differ in every document; and (d) Generalizable, beating or matching domain-specific methods in Twitter bot detection and human trafficking detection respectively, as well as being language-independent finding clusters in Spanish, Italian, and Japanese. Interpretability is particularly important for the anti human-trafficking domain, where law enforcement must visually inspect ads.Our experiments on real data show that INFOSHIELD correctly identifies Twitter bots with an F1 score over 90% and detects human-trafficking ads with 84% precision. Moreover, it is scalable, requiring about 8 hours for 4 million documents on a stock laptop.
RAFFIC V IS : Fighting Human Trafficking through Visualization
Catalina Vajiac
Andreas Olligschlaeger
Yifei Li
Pratheeksha Nair
Meng-Chieh Lee
Namyong Park
Duen Horng Chau
Christos Faloutsos
Law enforcement can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected ads. Given such cl… (voir plus)usters, how can we interactively visualize potential evidence for law enforcement and domain experts? We present TRAFFICVIS, which, to our knowledge, is the first interface for cluster-level HT detection and labeling. It builds on state-of-the-art HT clustering algorithms by incorporating metadata as a signal of organized and potentially suspicious activity. Also, domain experts can label clusters as HT, spam, and more, efficiently creating labeled datasets to enable further HT research. TRAFFICVIS has been built in close collaboration with domain experts, who estimate that TRAFFICVIS provides a median 36x speedup over manual labeling.
Scalable Change Point Detection for Dynamic Graphs
Real world networks often evolve in complex ways over time. Understanding anomalies in dynamic networks is crucial for applications such as … (voir plus)traffic accident detection, intrusion identification and detection of ecosystem disturbances. In this work, we focus on the problem of change point detection in dynamic graphs. The goal is to identify time steps where the graph structure deviates significantly from the norm. Despite empirical success of recent methods, building a change point detection method for real world dynamic graphs, which often scale to millions of nodes, remains an open question. To fill this gap, we propose LADdos, a scalable method for change point detection in dynamic graphs. LADdos brings together ideas from two recent works: an accurate change point detection method for graphs called LAD [10] which detects the changes in the full Laplacian spectrum of the graph in each timestamp, and the general framework of network density of states (DOS) [5] which models the distribution of the singular values through efficient approximation methods. In experiments with two common graph models –the Stochastic Block Model (SBM) and the Barabási-Albert (BA) model – we show that LADdos has equal performance to LAD, which is the current state-of-the-art, while being orders of magnitude faster. For instance, on a dynamic graph with total 21 million edges over 150 timestamps, LADdos achieves 100x speedup when compared to LAD.
Graph Neural Networks Learn Twitter Bot Behaviour
Albert Manuel Orozco Camacho
Sacha Lévy
Social media trends are increasingly taking a significant role for the understanding of modern social dynamics. In this work, we take a look… (voir plus) at how the Twitter landscape gets constantly shaped by automatically generated content. Twitter bot activity can be traced via network abstractions which, we hypothesize, can be learned through state-of-the-art graph neural network techniques. We employ a large bot database, continuously updated by Twitter, to learn how likely is that a user is mentioned by a bot, as well as, for a hashtag. Thus, we model this likelihood as a link prediction task between the set of users and hashtags. Moreover, we contrast our results by performing similar experiments on a crawled data set of real users.
ComplexDataLab at W-NUT 2020 Task 2: Detecting Informative COVID-19 Tweets by Attending over Linked Documents
Kellin Pelrine
Jacob Danovitch
Albert Orozco Camacho
Given the global scale of COVID-19 and the flood of social media content related to it, how can we find informative discussions? We present … (voir plus)Gapformer, which effectively classifies content as informative or not. It reformulates the problem as graph classification, drawing on not only the tweet but connected webpages and entities. We leverage a pre-trained language model as well as the connections between nodes to learn a pooled representation for each document network. We show it outperforms several competitive baselines and present ablation studies supporting the benefit of the linked information. Code is available on Github.
Contact Graph Epidemic Modelling of COVID-19 for Transmission and Intervention Strategies
Abby Leung
Xiaoye Ding
Shenyang Huang
The coronavirus disease 2019 (COVID-19) pandemic has quickly become a global public health crisis unseen in recent years. It is known that t… (voir plus)he structure of the human contact network plays an important role in the spread of transmissible diseases. In this work, we study a structure aware model of COVID-19 CGEM. This model becomes similar to the classical compartment-based models in epidemiology if we assume the contact network is a Erdos-Renyi (ER) graph, i.e. everyone comes into contact with everyone else with the same probability. In contrast, CGEM is more expressive and allows for plugging in the actual contact networks, or more realistic proxies for it. Moreover, CGEM enables more precise modelling of enforcing and releasing different non-pharmaceutical intervention (NPI) strategies. Through a set of extensive experiments, we demonstrate significant differences between the epidemic curves when assuming different underlying structures. More specifically we demonstrate that the compartment-based models are overestimating the spread of the infection by a factor of 3, and under some realistic assumptions on the compliance factor, underestimating the effectiveness of some of NPIs, mischaracterizing others (e.g. predicting a later peak), and underestimating the scale of the second peak after reopening.
Laplacian Change Point Detection for Dynamic Graphs
Shenyang Huang
Yasmeen Hitti
Machine learning analysis of exome trios to contrast the genomic architecture of autism and schizophrenia
Sameer Sardaar
Bill Qi
Alexandre Dionne-Laporte
Guy. A. Rouleau
SGP: Spotting Groups Polluting the Online Political Discourse
Junhao Wang
Sacha Lévy
Ren Wang
Aayushi Kulshrestha
Social media sites are becoming a key factor in politics. These platforms are easy to manipulate for the purpose of distorting information s… (voir plus)pace to confuse and distract voters. It is of paramount importance for social media platforms, users engaged with online political discussions, as well as government agencies to understand the dynamics on social media, and identify malicious groups engaging in misinformation campaigns and thus polluting the general discourse around a topic of interest. Past works to identify such disruptive patterns are mostly focused on analyzing user-generated content such as tweets. In this study, we take a holistic approach and propose SGP to provide an informative birds eye view of all the activities in these social media sites around a broad topic and detect coordinated groups suspicious of engaging in misinformation campaigns. To show the effectiveness of SGP, we deploy it to provide a concise overview of polluting activity on Twitter around the upcoming 2019 Canadian Federal Elections, by analyzing over 60 thousand user accounts connected through 3.4 million connections and 1.3 million hashtags. Users in the polluting groups detected by SGP-flag are over 4x more likely to become suspended while majority of these highly suspicious users detected by SGP-flag escaped Twitter's suspending algorithm. Moreover, while few of the polluting hashtags detected are linked to misinformation campaigns, SGP-sig also flags others that have not been picked up on. More importantly, we also show that a large coordinated set of right-winged conservative groups based in the US are heavily engaged in Canadian politics.