Aayushi Kulshrestha

DeltaShield: Information Theory for Human- Trafficking Detection

Catalina Vajiac

Meng-Chieh Lee

Aayushi Kulshrestha

Sacha Lévy

Namyong Park

Andreas Olligschlaeger

Cara Jones

Reihaneh Rabbany

Christos Faloutsos

2023-03-29

ACM Transactions on Knowledge Discovery from Data (published)

doi.org

INFOSHIELD: Generalizable Information-Theoretic Human-Trafficking Detection

Meng-Chieh Lee

Catalina Vajiac

Aayushi Kulshrestha

Sacha Lévy

Namyong Park

Cara Jones

Reihaneh Rabbany

Christos Faloutsos

Given a million escort advertisements, how can we spot near-duplicates? Such micro-clusters of ads are usually signals of human trafficking.… (see more) How can we summarize them, visually, to convince law enforcement to act? Can we build a general tool that works for different languages? Spotting micro-clusters of near-duplicate documents is useful in multiple, additional settings, including spam-bot detection in Twitter ads, plagiarism, and more.We present INFOSHIELD, which makes the following contributions: (a) Practical, being scalable and effective on real data, (b) Parameter-free and Principled, requiring no user-defined parameters, (c) Interpretable, finding a document to be the cluster representative, highlighting all the common phrases, and automatically detecting "slots", i.e. phrases that differ in every document; and (d) Generalizable, beating or matching domain-specific methods in Twitter bot detection and human trafficking detection respectively, as well as being language-independent finding clusters in Spanish, Italian, and Japanese. Interpretability is particularly important for the anti human-trafficking domain, where law enforcement must visually inspect ads.Our experiments on real data show that INFOSHIELD correctly identifies Twitter bots with an F1 score over 90% and detects human-trafficking ads with 84% precision. Moreover, it is scalable, requiring about 8 hours for 4 million documents on a stock laptop.

2021-04-18

2021 IEEE 37th International Conference on Data Engineering (ICDE) (published)

doi.org

SGP: Spotting Groups Polluting the Online Political Discourse

Junhao Wang

Sacha Lévy

Ren Wang

Aayushi Kulshrestha

Reihaneh Rabbany

Social media sites are becoming a key factor in politics. These platforms are easy to manipulate for the purpose of distorting information s… (see more)pace to confuse and distract voters. It is of paramount importance for social media platforms, users engaged with online political discussions, as well as government agencies to understand the dynamics on social media, and identify malicious groups engaging in misinformation campaigns and thus polluting the general discourse around a topic of interest. Past works to identify such disruptive patterns are mostly focused on analyzing user-generated content such as tweets. In this study, we take a holistic approach and propose SGP to provide an informative birds eye view of all the activities in these social media sites around a broad topic and detect coordinated groups suspicious of engaging in misinformation campaigns. To show the effectiveness of SGP, we deploy it to provide a concise overview of polluting activity on Twitter around the upcoming 2019 Canadian Federal Elections, by analyzing over 60 thousand user accounts connected through 3.4 million connections and 1.3 million hashtags. Users in the polluting groups detected by SGP-flag are over 4x more likely to become suspended while majority of these highly suspicious users detected by SGP-flag escaped Twitter's suspending algorithm. Moreover, while few of the polluting hashtags detected are linked to misinformation campaigns, SGP-sig also flags others that have not been picked up on. More importantly, we also show that a large coordinated set of right-winged conservative groups based in the US are heavily engaged in Canadian politics.

2019-10-15

ArXiv (preprint)

arxiv.org

Anomaly Detection with Joint Representation Learning of Content and Connection

Junhao Wang

Renhao Wang

Aayushi Kulshrestha

Reihaneh Rabbany

Social media sites are becoming a key factor in politics. These platforms are easy to manipulate for the purpose of distorting information s… (see more)pace to confuse and distract voters. Past works to identify disruptive patterns are mostly focused on analyzing the content of tweets. In this study, we jointly embed the information from both user posted content as well as a user's follower network, to detect groups of densely connected users in an unsupervised fashion. We then investigate these dense sub-blocks of users to flag anomalous behavior. In our experiments, we study the tweets related to the upcoming 2019 Canadian Elections, and observe a set of densely-connected users engaging in local politics in different provinces, and exhibiting troll-like behavior.

2019-06-15

ArXiv (preprint)

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Aayushi Kulshrestha

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Aayushi Kulshrestha

Publications