Reihaneh Rabbany

chin-chen.yang@mila.quebec

Zachary Yang

Doctorat - McGill University

elahe.kooshafar@mila.quebec

Elahe Kooshafar

Maîtrise recherche - McGill University

Farimah Poursafaei

Postdoctorat - McGill University

farimah.poursafaei@mila.quebec

Stagiaire de recherche - Université de Montréal

florence.laflamme@mila.quebec

jacob-junqi.tian@mila.quebec

Peter Yu

Collaborateur·rice de recherche - McGill University

hao.yu@mila.quebec

Jacob-Junqi Tian

Collaborateur·rice de recherche - McGill University

jean-francois.godbout@mila.quebec

Jean-François Godbout

Visiteur de recherche indépendant

Julia Gastinger

Collaborateur·rice alumni - University of Mannheim

Superviseur⋅e principal⋅e :

julia.gastinger@mila.quebec

kellin.pelrine@mila.quebec

Kellin Pelrine

Doctorat - McGill University

mauricio.rivera@mila.quebec

Mauricio Rivera

Collaborateur·rice de recherche - Graduated from McGill University

Pratheeksha Nair

Doctorat - McGill University

pratheeksha.nair@mila.quebec

Shirzadkhani Razieh Shirzadkhani

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

razieh.shirzadkhani@mila.quebec

sahar.omidishayegan@mila.quebec

Sahar Omidi Shayegan

Maîtrise recherche - McGill University

shahrad.mohammadzadeh@mila.quebec

Shahrad Mohammadzadeh

Collaborateur·rice de recherche - McGill University

Co-superviseur⋅e :

Doina Precup

Doctorat - McGill University

Co-superviseur⋅e :

Collaborateur·rice de recherche - McGill University

sophia.garrel@mila.quebec

Soroush Omranpour

Maîtrise recherche - McGill University

Co-superviseur⋅e :

soroush.omranpour@mila.quebec

svetlana.zhuk@mila.quebec

Sveta Zhuk

Stagiaire de recherche - Université de Montréal

Vidya Sujaya

Maîtrise recherche - McGill University

vidya.sujaya@mila.quebec

Billets de blogue

Flight-SEIR: Incorporating Flight Data to Improve Epidemiological Modelling and Disease Outbreak Prevention

3 août 2021

Flight-SEIR : incorporer les données de vol pour améliorer la modélisation épidémiologique et la prévention d’éclosions de maladies infectieuses

par

Shenyang Huang

Reihaneh Rabbany

Lire l'article

Publications

Temporal Graph Benchmark for Machine Learning on Temporal Graphs

Shenyang Huang

Farimah Poursafaei

Jacob Danovitch

Matthias Fey

Weihua Hu

Emanuele Rossi

Jure Leskovec

Michael M. Bronstein

We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and r… (voir plus)obust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/.

openreview.net

Party Prediction for Twitter

Kellin Pelrine

Anne Imouza

Zachary Yang

Jacob-Junqi Tian

Sacha Lévy

Gabrielle Desrosiers-Brisebois

Aarash Feizi

C'ecile Amadoro

André Blais

Jean-François Godbout

2023-08-25

ArXiv (prépublication)

Open, Closed, or Small Language Models for Text Classification?

Hao Yu

Zachary Yang

Kellin Pelrine

Jean-François Godbout

Recent advancements in large language models have demonstrated remarkable capabilities across various NLP tasks. But many questions remain, … (voir plus)including whether open-source models match closed ones, why these models excel or struggle with certain tasks, and what types of practical procedures can improve performance. We address these questions in the context of classification by evaluating three classes of models using eight datasets across three distinct tasks: named entity recognition, political party prediction, and misinformation detection. While larger LLMs often lead to improved performance, open-source models can rival their closed-source counterparts by fine-tuning. Moreover, supervised smaller models, like RoBERTa, can achieve similar or even greater performance in many datasets compared to generative LLMs. On the other hand, closed models maintain an advantage in hard tasks that demand the most generalizability. This study underscores the importance of model selection based on task requirements

2023-08-19

ArXiv (prépublication)

ToxBuster: In-game Chat Toxicity Buster with BERT

Zachary Yang

Yasmine Maricar

M. Davari

Nicolas Grenon-Godbout

Detecting toxicity in online spaces is challenging and an ever more pressing problem given the increase in social media and gaming consumpti… (voir plus)on. We introduce ToxBuster, a simple and scalable model trained on a relatively large dataset of 194k lines of game chat from Rainbow Six Siege and For Honor, carefully annotated for different kinds of toxicity. Compared to the existing state-of-the-art, ToxBuster achieves 82.95% (+7) in precision and 83.56% (+57) in recall. This improvement is obtained by leveraging past chat history and metadata. We also study the implication towards real-time and post-game moderation as well as the model transferability from one game to another.

2023-05-21

ArXiv (prépublication)

Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Shenyang Huang

Jacob Danovitch

2023-05-15

ArXiv (prépublication)

Social Media as a Vector for Escort Ads:A Study on OnlyFans advertisements on Twitter

Maricarmen Arenas

Pratheeksha Nair

Golnoosh Farnadi

Online sex trafficking is on the rise and a majority of trafficking victims report being advertised online. The use of OnlyFans as a platfor… (voir plus)m for adult content is also increasing, with Twitter as its main advertising tool. Furthermore, we know that traffickers usually work within a network and control multiple victims. Consequently, we suspect that there may be networks of traffickers promoting multiple OnlyFans accounts belonging to their victims. To this end, we present the first study of OnlyFans advertisements on Twitter in the context of finding organized activities. Preliminary analysis of this space shows that most tweets related to OnlyFans contain generic text, making text-based methods less reliable. Instead, focusing on what ties the authors of these tweets together, we propose a novel method for uncovering coordinated networks of users based on their behaviour. Our method, called Multi-Level Clustering (MLC), combines two levels of clustering that considers both the network structure as well as embedded node attribute information. It focuses jointly on user connections (through mentions) and content (through shared URLs). We apply MLC to real-world data of 2 million tweets pertaining to OnlyFans and analyse the detected groups. We also evaluate our method on synthetically generated data (with injected ground truth) and show its superior performance compared to competitive baselines. Finally, we discuss examples of organized clusters as case studies and provide interesting conclusions to our study.

2023-04-30

Proceedings of the 15th ACM Web Science Conference 2023 (publié)

DisKeyword: Tweet Corpora Exploration for Keyword Selection

Sacha Lévy

2023-02-27

Web Search and Data Mining (publié)

DeltaShield: Information Theory for Human- Trafficking Detection

Catalina Vajiac

Meng-Chieh Lee

Aayushi Kulshrestha

Sacha Lévy

Namyong Park

Andreas Olligschlaeger

Cara Jones

Christos Faloutsos

2023-02-07

ACM Transactions on Knowledge Discovery from Data (publié)

Towards Detecting Contextual Real-Time Toxicity for In-Game Chat

Zachary Yang

Nicolas Grenon-Godbout

Real-time toxicity detection in online environments poses a significant challenge, due to the increasing prevalence of social media and gami… (voir plus)ng platforms. We introduce ToxBuster, a simple and scalable model that reliably detects toxic content in real-time for a line of chat by including chat history and metadata. ToxBuster consistently outperforms conventional toxicity models across popular multiplayer games, including Rainbow Six Siege, For Honor, and DOTA 2. We conduct an ablation study to assess the importance of each model component and explore ToxBuster's transferability across the datasets. Furthermore, we showcase ToxBuster's efficacy in post-game moderation, successfully flagging 82.1% of chat-reported players at a precision level of 90.0%. Additionally, we show how an additional 6% of unreported toxic players can be proactively moderated.

2023-01-01

EMNLP (Findings) (publié)

openreview.net

TrafficVis: Visualizing Organized Activity and Spatio-Temporal Patterns for Detecting and Labeling Human Trafficking

Catalina Vajiac

Duen Horng Chau

Andreas Olligschlaeger

Rebecca Mackenzie

Pratheeksha Nair

Meng-Chieh Lee

Yifei Li

Namyong Park

Christos Faloutsos

Law enforcement and domain experts can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected… (voir plus) ads. How can we explain clustering results intuitively and interactively, visualizing potential evidence for experts to analyze? We present TrafficVis, the first interface for cluster-level HT detection and labeling. Developed through months of participatory design with domain experts, TrafficVis provides coordinated views in conjunction with carefully chosen backend algorithms to effectively show spatio-temporal and text patterns to a wide variety of anti-HT stakeholders. We build upon state-of-the-art text clustering algorithms by incorporating shared metadata as a signal of connected and possibly suspicious activity, then visualize the results. Domain experts can use TrafficVis to label clusters as HT, or other, suspicious, but non-HT activity such as spam and scam, quickly creating labeled datasets to enable further HT research. Through domain expert feedback and a usage scenario, we demonstrate TRAFFICVIS's efficacy. The feedback was overwhelmingly positive, with repeated high praises for the usability and explainability of our tool, the latter being vital for indicting possible criminals.

2023-01-01

IEEE Transactions on Visualization and Computer Graphics (publié)

Active Keyword Selection to Track Evolving Topics on Twitter

Sacha Lévy

Farimah Poursafaei

Kellin Pelrine

How can we study social interactions on evolving topics at a mass scale? Over the past decade, researchers from diverse fields such as econo… (voir plus)mics, political science, and public health have often done this by querying Twitter's public API endpoints with hand-picked topical keywords to search or stream discussions. However, despite the API's accessibility, it remains difficult to select and update keywords to collect high-quality data relevant to topics of interest. In this paper, we propose an active learning method for rapidly refining query keywords to increase both the yielded topic relevance and dataset size. We leverage a large open-source COVID-19 Twitter dataset to illustrate the applicability of our method in tracking Tweets around the key sub-topics of Vaccine, Mask, and Lockdown. Our experiments show that our method achieves an average topic-related keyword recall 2x higher than baselines. We open-source our code along with a web interface for keyword selection to make data collection from Twitter more systematic for researchers.

2022-12-01

2022 IEEE International Conference on Data Mining Workshops (ICDMW) (publié)

Early Detection of Sexual Predators with Federated Learning

Khaoula Chehbouni

Gilles Caporossi