Reihaneh Rabbany

chin-chen.yang@mila.quebec

Zachary Yang

Doctorat - McGill University

elahe.kooshafar@mila.quebec

Elahe Kooshafar

Maîtrise recherche - McGill University

Farimah Poursafaei

Postdoctorat - McGill University

farimah.poursafaei@mila.quebec

Stagiaire de recherche - Université de Montréal

florence.laflamme@mila.quebec

jacob-junqi.tian@mila.quebec

Peter Yu

Collaborateur·rice de recherche - McGill University

hao.yu@mila.quebec

Jacob-Junqi Tian

Collaborateur·rice de recherche - McGill University

jean-francois.godbout@mila.quebec

Jean-François Godbout

Visiteur de recherche indépendant

julia.gastinger@mila.quebec

Julia Gastinger

Collaborateur·rice alumni - University of Mannheim

Superviseur⋅e principal⋅e :

Guillaume Rabusseau

kellin.pelrine@mila.quebec

Kellin Pelrine

Doctorat - McGill University

mauricio.rivera@mila.quebec

Mauricio Rivera

Collaborateur·rice de recherche - Graduated from McGill University

Pratheeksha Nair

Doctorat - McGill University

pratheeksha.nair@mila.quebec

Shirzadkhani Razieh Shirzadkhani

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Guillaume Rabusseau

razieh.shirzadkhani@mila.quebec

sahar.omidishayegan@mila.quebec

Sahar Omidi Shayegan

Maîtrise recherche - McGill University

shahrad.mohammadzadeh@mila.quebec

Shahrad Mohammadzadeh

Collaborateur·rice de recherche - McGill University

Co-superviseur⋅e :

Doina Precup

Doctorat - McGill University

Co-superviseur⋅e :

Collaborateur·rice de recherche - McGill University

sophia.garrel@mila.quebec

soroush.omranpour@mila.quebec

Soroush Omranpour

Maîtrise recherche - McGill University

Co-superviseur⋅e :

Guillaume Rabusseau

svetlana.zhuk@mila.quebec

Sveta Zhuk

Stagiaire de recherche - Université de Montréal

Vidya Sujaya

Maîtrise recherche - McGill University

vidya.sujaya@mila.quebec

Billets de blogue

Flight-SEIR: Incorporating Flight Data to Improve Epidemiological Modelling and Disease Outbreak Prevention

3 août 2021

Flight-SEIR : incorporer les données de vol pour améliorer la modélisation épidémiologique et la prévention d’éclosions de maladies infectieuses

par

Shenyang Huang

Reihaneh Rabbany

Lire l'article

Publications

Revisiting Hotels-50K and Hotel-ID

Aarash Feizi

Arantxa Casanova

Adriana Romero Soriano

In this paper, we propose revisited versions for two recent hotel recognition datasets: Hotels-50K and Hotel-ID. The revisited versions prov… (voir plus)ide evaluation setups with different levels of difﬁculty to better align with the intended real-world application, i.e. countering human trafﬁcking. Real-world scenarios involve hotels and locations that are not captured in the current data sets, therefore it is important to consider evaluation settings where classes are truly unseen. We test this setup using multiple state-of-the-art image retrieval models and show that as expected, the models’ performances decrease as the evaluation gets closer to the real-world unseen settings. The rankings of the best performing models also change across the different evaluation settings, which further motivates using the proposed revisited datasets.

2022-07-20

ArXiv (prépublication)

VisPaD: Visualization and Pattern Discovery for Fighting Human Trafficking

Pratheeksha Nair

Yifei Li

Catalina Vajiac

Andreas Olligschlaeger

Meng-Chieh Lee

Namyong Park

Duen Horng Chau

Christos Faloutsos

Chieh Lee

Human trafficking analysts investigate groups of related online escort advertisements (called micro-clusters) to detect suspicious activitie… (voir plus)s and identify various modus operandi. This task is complex as it requires finding patterns and linked meta-data across micro-clusters such as the geographical spread of ads, cluster sizes, etc. Additionally, drawing insights from the data is challenging without visualizing these micro-clusters. To address this, in close-collaboration with domain experts, we built VisPaD, a novel interactive way for characterizing and visualizing micro-clusters and their associated meta-data, all in one place. VisPaD helps discover underlying patterns in the data by projecting micro-clusters in a lower dimensional space. It also allows the user to select micro-clusters involved in suspicious patterns and interactively examine them leading to faster detection and identification of trends in the data. A demo of VisPaD is also released1.

2022-04-25

The Web Conference (published)

VisPaD: Visualization and Pattern Discovery for Fighting Human Trafficking

Pratheeksha Nair

Yifei Li

Catalina Vajiac

Andreas Olligschlaeger

Meng-Chieh Lee

Namyong Park

Duen Horng Chau

Christos Faloutsos

Chieh Lee

2022-04-25

The Web Conference (publié)

A Strong Node Classification Baseline for Temporal Graphs

Farimah Poursafaei

Željko Žilić

2022-04-20

Proceedings of the 2022 SIAM International Conference on Data Mining (SDM) (publié)

Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking

Yifei Li

Pratheeksha Nair

Kellin Pelrine

Online escort advertisement websites are widely used for advertising victims of human trafficking. Domain experts agree that advertising mul… (voir plus)tiple people in the same ad is a strong indicator of trafficking. Thus, extracting person names from the text of these ads can provide valuable clues for further analysis. However, Named-Entity Recognition (NER) on escort ads is challenging because the text can be noisy, colloquial and often lacking proper grammar and punctuation. Most existing state-of-the-art NER models fail to demonstrate satisfactory performance in this task. In this paper, we propose NEAT (Name Extraction Against Trafficking) for extracting person names. It effectively combines classic rule-based and dictionary extractors with a contextualized language model to capture ambiguous names (e.g penny, hazel) and adapts to adversarial changes in the text by expanding its dictionary. NEAT shows 19% improvement on average in the F1 classification score for name extraction compared to previous state-of-the-art in two domain-specific datasets.

2022-01-01

Findings (published)

Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking

Yifei Li

Pratheeksha Nair

Kellin Pelrine

2022-01-01

Findings (publié)

Towards Better Evaluation for Dynamic Link Prediction

Farimah Poursafaei

Andy Huang

Shenyang Huang

Kellin Pelrine

Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In th… (voir plus)is work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization-based baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings which highlights that the negative edges used in the current evaluation are easy. To sample more challenging negative edges, we introduce two novel negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.

openreview.net

Curating the Twitter Election Integrity Datasets for Better Online Troll Characterization

Albert Manuel Orozco Camacho

In modern days, social media platforms provide accessible channels for the inter-action and immediate reflection of the most important event… (voir plus)s happening around the world. In this paper, we, firstly, present a curated set of datasets whose origin stem from the Twitter’s Information Operations efforts. More notably, these accounts, which have been already suspended, provide a notion of how state-backed human trolls operate.Secondly, we present detailed analyses of how these behaviours vary over time,and motivate its use and abstraction in the context of deep representation learning:for instance, to learn and, potentially track, troll behaviour. We present baselinesf or such tasks and highlight the differences there may exist within the literature.Finally, we utilize the representations learned for behaviour prediction to classify trolls from"real"users, using a sample of non-suspended active accounts.

2021-12-07

LatinX in AI at Neural Information Processing Systems Conference 2021 (publié)

openreview.net

Online Partisan Polarization of COVID-19

Zachary Yang

Anne Imouza

Kellin Pelrine

Sacha Lévy

Jiewen Liu

Gabrielle Desrosiers-Brisebois

Jean-François Godbout

André Blais

In today’s age of (mis)information, many people utilize various social media platforms in an attempt to shape public opinion on several im… (voir plus)portant issues, including elections and the COVID-19 pandemic. These two topics have recently become intertwined given the importance of complying with public health measures related to COVID-19 and politicians’ management of the pandemic. Motivated by this, we study the partisan polarization of COVID-19 discussions on social media. We propose and utilize a novel measure of partisan polarization to analyze more than 380 million posts from Twitter and Parler around the 2020 US presidential election. We find strong correlation between peaks in polarization and polarizing events, such as the January 6th Capitol Hill riot. We further classify each post into key COVID-19 issues of lockdown, masks, vaccines, as well as miscellaneous, to investigate both the volume and polarization on these topics and how they vary through time. Parler includes more negative discussions around lockdown and masks, as expected, but not much around vaccines. We also observe more balanced discussions on Twitter and a general disconnect between the discussions on Parler and Twitter.

2021-12-01

2021 International Conference on Data Mining Workshops (ICDMW) (publié)

Incorporating dynamic flight network in SEIR to model mobility between populations

Xiaoye Ding

Shenyang Huang

Abby Leung

2021-06-10

Applied Network Science (publié)

The Surprising Performance of Simple Baselines for Misinformation Detection

Kellin Pelrine

Jacob Danovitch

As social media becomes increasingly prominent in our day to day lives, it is increasingly important to detect informative content and preve… (voir plus)nt the spread of disinformation and unverified rumours. While many sophisticated and successful models have been proposed in the literature, they are often compared with older NLP baselines such as SVMs, CNNs, and LSTMs. In this paper, we examine the performance of a broad set of modern transformer-based language models and show that with basic fine-tuning, these models are competitive with and can even significantly outperform recently proposed state-of-the-art methods. We present our framework as a baseline for creating and evaluating new methods for misinformation detection. We further study a comprehensive set of benchmark datasets, and discuss potential data leakage and the need for careful design of the experiments and understanding of datasets to account for confounding variables. As an extreme case example, we show that classifying only based on the first three digits of tweet ids, which contain information on the date, gives state-of-the-art performance on a commonly used benchmark dataset for fake news detection –Twitter16. We provide a simple tool to detect this problem and suggest steps to mitigate it in future datasets.

2021-06-03

Proceedings of the Web Conference 2021 (publié)

Graph Attention Networks with Positional Embeddings

Liheng Ma

Adriana Romero Soriano

2021-05-09

Advances in Knowledge Discovery and Data Mining (publié)