Reihaneh Rabbany

Reframing AI-for-Good: Radical Questioning in AI for Human Trafficking Interventions

Gabriel Lefebvre

2025-10-15

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (publié)

doi.org

Reframing AI-for-Good: Radical Questioning in AI for Human Trafficking Interventions

Gabriel Lefebvre

2025-10-15

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (publié)

doi.org

Reframing AI-for-Good: Radical Questioning in AI for Human Trafficking Interventions

Gabriel Lefebvre

This paper introduces Radical Questioning (RQ), a structured, pre-design ethics framework developed to assess whether artificial intelligenc… (voir plus)e (AI) should be applied to complex social problems rather than merely how. While much of responsible AI development focuses on aligning systems with principles such as fairness, transparency, and accountability, it often begins after the decision to build has already been made, implicitly treating the deployment of AI as a given rather than a question in itself. In domains such as human trafficking, marked by contested definitions, systemic injustice, and deep stakeholder asymmetries, such assumptions can obscure foundational ethical concerns. RQ offers an upstream, deliberative process for surfacing these concerns before design begins. Drawing from critical theory, participatory ethics, and relational responsibility, RQ formalizes a five-step framework to interrogate problem framings, confront techno-solutionist tendencies, and reflect on the moral legitimacy of intervention. Developed through interdisciplinary collaboration and engagement with survivor-led organizations, RQ was piloted in the domain of human trafficking (HT) which is a particularly high-stakes and ethically entangled application area. Its use led to a fundamental design shift: away from automated detection tools and toward survivor-controlled, empowerment-based technologies. We argue that RQ's novelty lies in both its temporal position, i.e, prior to technical design, and its orientation toward domains where harm is structural and ethical clarity cannot be achieved through one-size-fits-all solutions. RQ thus addresses a critical gap between abstract principles of responsible AI and the lived ethical demands of real-world deployment.

2025-10-15

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (publié)

doi.org

TGM: a Modular and Efficient Library for Machine Learning on Temporal Graphs

Tran Gia Bao Ngo

Jure Leskovec

Michael M. Bronstein

Guillaume Rabusseau

Matthias Fey

Reihaneh Rabbany

Well-designed open-source software drives progress in Machine Learning (ML) research. While static graph ML enjoys mature frameworks like Py… (voir plus)Torch Geometric and DGL, ML for temporal graphs (TG), networks that evolve over time, lacks comparable infrastructure. Existing TG libraries are often tailored to specific architectures, hindering support for diverse models in this rapidly evolving field. Additionally, the divide between continuous- and discrete-time dynamic graph methods (CTDG and DTDG) limits direct comparisons and idea transfer. To address these gaps, we introduce Temporal Graph Modelling (TGM), a research-oriented library for ML on temporal graphs, the first to unify CTDG and DTDG approaches. TGM offers first-class support for dynamic node features, time-granularity conversions, and native handling of link-, node-, and graph-level tasks. Empirically, TGM achieves an average 7.8x speedup across multiple models, datasets, and tasks compared to the widely used DyGLib, and an average 175x speedup on graph discretization relative to available implementations. Beyond efficiency, we show in our experiments how TGM unlocks entirely new research possibilities by enabling dynamic graph property prediction and time-driven training paradigms, opening the door to questions previously impractical to study. TGM is available at https://github.com/tgm-team/tgm

2025-10-08

ArXiv (prépublication)

arxiv.org

TGM: a Modular and Efficient Library for Machine Learning on Temporal Graphs

Tran Gia Bao Ngo

Jure Leskovec

Michael M. Bronstein

Guillaume Rabusseau

Matthias Fey

Reihaneh Rabbany

Well-designed open-source software drives progress in Machine Learning (ML) research. While static graph ML enjoys mature frameworks like Py… (voir plus)Torch Geometric and DGL, ML for temporal graphs (TG), networks that evolve over time, lacks comparable infrastructure. Existing TG libraries are often tailored to specific architectures, hindering support for diverse models in this rapidly evolving field. Additionally, the divide between continuous- and discrete-time dynamic graph methods (CTDG and DTDG) limits direct comparisons and idea transfer. To address these gaps, we introduce Temporal Graph Modelling (TGM), a research-oriented library for ML on temporal graphs, the first to unify CTDG and DTDG approaches. TGM offers first-class support for dynamic node features, time-granularity conversions, and native handling of link-, node-, and graph-level tasks. Empirically, TGM achieves an average 7.8x speedup across multiple models, datasets, and tasks compared to the widely used DyGLib, and an average 175x speedup on graph discretization relative to available implementations. Beyond efficiency, we show in our experiments how TGM unlocks entirely new research possibilities by enabling dynamic graph property prediction and time-driven training paradigms, opening the door to questions previously impractical to study. TGM is available at https://github.com/tgm-team/tgm

2025-10-08

ArXiv (prépublication)

arxiv.org

TGM: a Modular and Efficient Library for Machine Learning on Temporal Graphs

Tran Gia Bao Ngo

Jure Leskovec

Michael M. Bronstein

Guillaume Rabusseau

Matthias Fey

Reihaneh Rabbany

Well-designed open-source software drives progress in Machine Learning (ML) research. While static graph ML enjoys mature frameworks like Py… (voir plus)Torch Geometric and DGL, ML for temporal graphs (TG), networks that evolve over time, lacks comparable infrastructure. Existing TG libraries are often tailored to specific architectures, hindering support for diverse models in this rapidly evolving field. Additionally, the divide between continuous- and discrete-time dynamic graph methods (CTDG and DTDG) limits direct comparisons and idea transfer. To address these gaps, we introduce Temporal Graph Modelling (TGM), a research-oriented library for ML on temporal graphs, the first to unify CTDG and DTDG approaches. TGM offers first-class support for dynamic node features, time-granularity conversions, and native handling of link-, node-, and graph-level tasks. Empirically, TGM achieves an average 7.8x speedup across multiple models, datasets, and tasks compared to the widely used DyGLib, and an average 175x speedup on graph discretization relative to available implementations. Beyond efficiency, we show in our experiments how TGM unlocks entirely new research possibilities by enabling dynamic graph property prediction and time-driven training paradigms, opening the door to questions previously impractical to study. TGM is available at https://github.com/tgm-team/tgm

2025-10-08

ArXiv (prépublication)

arxiv.org

CrediBench: Building Web-Scale Network Datasets for Information Integrity

James Zhou

Jean-François Godbout

Michael M. Bronstein

Reihaneh Rabbany

Shenyang Huang

Online misinformation poses an escalating threat, amplified by the Internet's open nature and increasingly capable LLMs that generate persua… (voir plus)sive yet deceptive content. Existing misinformation detection methods typically focus on either textual content or network structure in isolation, failing to leverage the rich, dynamic interplay between website content and hyperlink relationships that characterizes real-world misinformation ecosystems. We introduce CrediBench: a large-scale data processing pipeline for constructing temporal web graphs that jointly model textual content and hyperlink structure for misinformation detection. Unlike prior work, our approach captures the dynamic evolution of general misinformation domains, including changes in both content and inter-site references over time. Our processed one-month snapshot extracted from the Common Crawl archive in December 2024 contains 45 million nodes and 1 billion edges, representing the largest web graph dataset made publicly available for misinformation research to date. From our experiments on this graph snapshot, we demonstrate the strength of both structural and webpage content signals for learning credibility scores, which measure source reliability. The pipeline and experimentation code are all available here, and the dataset is in this folder.

2025-09-27

ArXiv (prépublication)

arxiv.org

$\texttt{BluePrint}$: A Social Media User Dataset for LLM Persona Evaluation and Training

Aur'elien Buck-Kaeffer

Je Qin Chooi

Dan Zhao

Maximilian Puelma Touzel

Kellin Pelrine

Jean-François Godbout

Reihaneh Rabbany

Zachary Yang

Large language models (LLMs) offer promising capabilities for simulating social media dynamics at scale, enabling studies that would be ethi… (voir plus)cally or logistically challenging with human subjects. However, the field lacks standardized data resources for fine-tuning and evaluating LLMs as realistic social media agents. We address this gap by introducing SIMPACT, the SIMulation-oriented Persona and Action Capture Toolkit, a privacy respecting framework for constructing behaviorally-grounded social media datasets suitable for training agent models. We formulate next-action prediction as a task for training and evaluating LLM-based agents and introduce metrics at both the cluster and population levels to assess behavioral fidelity and stylistic realism. As a concrete implementation, we release BluePrint, a large-scale dataset built from public Bluesky data focused on political discourse. BluePrint clusters anonymized users into personas of aggregated behaviours, capturing authentic engagement patterns while safeguarding privacy through pseudonymization and removal of personally identifiable information. The dataset includes a sizable action set of 12 social media interaction types (likes, replies, reposts, etc.), each instance tied to the posting activity preceding it. This supports the development of agents that use context-dependence, not only in the language, but also in the interaction behaviours of social media to model social media users. By standardizing data and evaluation protocols, SIMPACT provides a foundation for advancing rigorous, ethically responsible social media simulations. BluePrint serves as both an evaluation benchmark for political discourse modeling and a template for building domain specific datasets to study challenges such as misinformation and polarization.

2025-09-27

ArXiv (prépublication)

arxiv.org

$\texttt{BluePrint}$: A Social Media User Dataset for LLM Persona Evaluation and Training

Aur'elien Buck-Kaeffer

Je Qin Chooi

Dan Zhao

Maximilian Puelma Touzel

Kellin Pelrine

Jean-François Godbout

Reihaneh Rabbany

Zachary Yang

Large language models (LLMs) offer promising capabilities for simulating social media dynamics at scale, enabling studies that would be ethi… (voir plus)cally or logistically challenging with human subjects. However, the field lacks standardized data resources for fine-tuning and evaluating LLMs as realistic social media agents. We address this gap by introducing SIMPACT, the SIMulation-oriented Persona and Action Capture Toolkit, a privacy respecting framework for constructing behaviorally-grounded social media datasets suitable for training agent models. We formulate next-action prediction as a task for training and evaluating LLM-based agents and introduce metrics at both the cluster and population levels to assess behavioral fidelity and stylistic realism. As a concrete implementation, we release BluePrint, a large-scale dataset built from public Bluesky data focused on political discourse. BluePrint clusters anonymized users into personas of aggregated behaviours, capturing authentic engagement patterns while safeguarding privacy through pseudonymization and removal of personally identifiable information. The dataset includes a sizable action set of 12 social media interaction types (likes, replies, reposts, etc.), each instance tied to the posting activity preceding it. This supports the development of agents that use context-dependence, not only in the language, but also in the interaction behaviours of social media to model social media users. By standardizing data and evaluation protocols, SIMPACT provides a foundation for advancing rigorous, ethically responsible social media simulations. BluePrint serves as both an evaluation benchmark for political discourse modeling and a template for building domain specific datasets to study challenges such as misinformation and polarization.

2025-09-27

ArXiv (prépublication)

arxiv.org

Are Large Language Models Good Temporal Graph Learners?

Zifeng Ding

Michael M. Bronstein

Reihaneh Rabbany

Guillaume Rabusseau

Large Language Models (LLMs) have recently driven significant advancements in Natural Language Processing and various other applications. Wh… (voir plus)ile a broad range of literature has explored the graph-reasoning capabilities of LLMs, including their use of predictors on graphs, the application of LLMs to dynamic graphs -- real world evolving networks -- remains relatively unexplored. Recent work studies synthetic temporal graphs generated by random graph models, but applying LLMs to real-world temporal graphs remains an open question. To address this gap, we introduce Temporal Graph Talker (TGTalker), a novel temporal graph learning framework designed for LLMs. TGTalker utilizes the recency bias in temporal graphs to extract relevant structural information, converted to natural language for LLMs, while leveraging temporal neighbors as additional information for prediction. TGTalker demonstrates competitive link prediction capabilities compared to existing Temporal Graph Neural Network (TGNN) models. Across five real-world networks, TGTalker performs competitively with state-of-the-art temporal graph methods while consistently outperforming popular models such as TGN and HTGN. Furthermore, TGTalker generates textual explanations for each prediction, thus opening up exciting new directions in explainability and interpretability for temporal link prediction. The code is publicly available at https://github.com/shenyangHuang/TGTalker.

2025-09-22

NeurIPS.cc/2025/Workshop/NPGML (poster)

doi.org

openreview.net

CrediBench: Building Web-Scale Network Datasets for Information Integrity

James Zhou

Jean-François Godbout

Michael M. Bronstein

Reihaneh Rabbany

Shenyang Huang

Online misinformation poses an escalating threat, amplified by the Internet's open nature and increasingly capable LLMs that generate persua… (voir plus)sive yet deceptive content. Existing misinformation detection methods typically focus on either textual content or network structure in isolation, failing to leverage the rich, dynamic interplay between website content and hyperlink relationships that characterizes real-world misinformation ecosystems. We introduce CrediBench: a large-scale data processing pipeline for constructing temporal web graphs that jointly model textual content and hyperlink structure for misinformation detection. Unlike prior work, our approach captures the dynamic evolution of general misinformation domains, including changes in both content and inter-site references over time. Our processed one-month snapshot extracted from the Common Crawl archive in December 2024 contains 45 million nodes and 1 billion edges, representing the largest web graph dataset made publicly available for misinformation research to date. From our experiments on this graph snapshot, we demonstrate the strength of both structural and webpage content signals for learning credibility scores, which measure source reliability. The pipeline and experimentation code are all available here, and the dataset is in this folder.

2025-09-22

NeurIPS.cc/2025/Workshop/NPGML (poster)

doi.org

openreview.net

OpenFake: An Open Dataset and Platform Toward Real-World Deepfake Detection

Akshatha Arodi

Ga'etan Marceau Caron

Jean-François Godbout

Reihaneh Rabbany

Deepfakes, synthetic media created using advanced AI techniques, pose a growing threat to information integrity, particularly in politically… (voir plus) sensitive contexts. This challenge is amplified by the increasing realism of modern generative models, which our human perception study confirms are often indistinguishable from real images. Yet, existing deepfake detection benchmarks rely on outdated generators or narrowly scoped datasets (e.g., single-face imagery), limiting their utility for real-world detection. To address these gaps, we present OpenFake, a large politically grounded dataset specifically crafted for benchmarking against modern generative models with high realism, and designed to remain extensible through an innovative crowdsourced adversarial platform that continually integrates new hard examples. OpenFake comprises nearly four million total images: three million real images paired with descriptive captions and almost one million synthetic counterparts from state-of-the-art proprietary and open-source models. Detectors trained on OpenFake achieve near-perfect in-distribution performance, strong generalization to unseen generators, and high accuracy on a curated in-the-wild social media test set, significantly outperforming models trained on existing datasets. Overall, we demonstrate that with high-quality and continually updated benchmarks, automatic deepfake detection is both feasible and effective in real-world settings.

2025-09-11

ArXiv (prépublication)

doi.org

arxiv.org

Programme d’apprentissage IA sur mesure

Mil'Haq Fest 2025

Communauté de pratique de Mila

Demandes de supervision

Reihaneh Rabbany

Biographie

Étudiants actuels

Billets de blogue

Publications

Programme d’apprentissage IA sur mesure

Mil'Haq Fest 2025

Communauté de pratique de Mila

Demandes de supervision

Mots-clés populaires:

Reihaneh Rabbany

Biographie

Étudiants actuels

Billets de blogue

Publications