Jean-François Godbout

Philip Di Domenico

Research Intern - York

Hania Hania

Research Intern - York

Ryan Hooshiar

Research Intern - York

Anne Imouza

PhD - McGill University

Co-supervisor :

Andreea Musulan

Postdoctorate - Université de Montréal

Julien Robin

PhD - Université de Montréal

Google Scholar

Tong Wu

Collaborating researcher - McGill University University

Principal supervisor :

Website

Github

Sveta Zhuk

Master's Research - Université de Montréal

Co-supervisor :

Publications

Rules of the game: Legislative exits in four Westminster systems

Alex B. Rivard

Marc André Bodet

By leveraging over 150 years of electoral and biographical data in the Canadian provinces of Ontario, Quebec, New Brunswick, and Nova Scot… (see more)ia, we argue that voluntary exit is best understood as a cost-benefit calculation shaped by positional and institutional incentives in the legislative arena. We show that institutional changes that make seeking re-election costlier are associated with an increased likelihood of a legislator voluntarily exiting the legislative arena. We also find that the determinants of exit vary across age cohorts: younger legislators are more sensitive to institutional and positional cost-benefit incentives, reflecting greater professional mobility and outside career opportunities. Overall, our results indicate that positional and institutional in part explain a legislator’s decision to not seek re-election, but that their impact of these incentives is mediated by life-cycle and retirement-horizon considerations.

2026-06-05

International Political Science Review (published)

EASE Configuration Facilitates A Reproducible Science of LLM Social Simulations

Aurélien Bück-Kaeffer

LLMs are increasingly deployed to simulate social interactions, yet many of the existing simulators remain ad hoc and monolithic. This lack … (see more)of architectural standardization prevents reproducible research and complicates downstream evaluation. We advance a rigorous science of LLM-based multi-agent simulation by modularizing core components into Environments, Agents, Simulation engines, and Evaluation metrics (EASE). We demonstrate the utility of EASE configuration by wrapping it in an experimental study schema for orchestrating workflows centered around answering explicit research questions in generated scenarios. We contribute SiliSocS, an open-source, research-ready Silicon Society Sandbox implementing a study-structured EASE configuration to enable highly configurable and reproducible LLM-based social simulations. Using SiliSocS and EASE, we present three case studies, showcasing the system's comprehensive assessment of existing questions, ability to dive deeper into complex questions, and elaboration of existing studies, respectively. Together, these case studies highlight the limitations of current modeling approaches and isolate the impacts of design choices on key results.

2026-05-27

arXiv (preprint)

The $\textit{Silicon Society}$ Cookbook: Design Space of LLM-based Social Simulations

Aurélien Bück-Kaeffer

Studies attempting to simulate human behavior with …

2026-04-29

arXiv (preprint)

What do people want to fact-check?

Bijean Ghafouri

Dorsaf Sallami

Luca Luceri

Taylor Lynn Curtis

Emilio Ferrara

2026-02-10

arXiv (preprint)

Large language models can effectively convince people to believe conspiracies

Thomas H Costello

Matthew Kowal

Antonio A. Arechar

Adam Gleave

David G. Rand

Gordon Pennycook

Large language models (LLMs) have been shown to be persuasive across a variety of contexts. But it remains unclear whether this persuasive p… (see more)ower advantages truth over falsehood, or if LLMs can promote misbeliefs just as easily as refuting them. Here, we investigate this question across three pre-registered experiments in which participants (N = 2,724 Americans) discussed a conspiracy theory they were uncertain about with GPT-4o, and the model was instructed to either argue against ("debunking") or for ("bunking") that conspiracy. When using a"jailbroken"GPT-4o variant with guardrails removed, the AI was as effective at increasing conspiracy belief as decreasing it. Concerningly, the bunking AI was rated more positively, and increased trust in AI, more than the debunking AI. Surprisingly, we found that using standard GPT-4o produced very similar effects, such that the guardrails imposed by OpenAI did little to prevent the LLM from promoting conspiracy beliefs. Encouragingly, however, a corrective conversation reversed these newly induced conspiracy beliefs, and simply prompting GPT-4o to only use accurate information dramatically reduced its ability to increase conspiracy beliefs. Our findings demonstrate that LLMs possess potent abilities to promote both truth and falsehood, but that potential solutions may exist to help mitigate this risk.

2026-01-07

ArXiv (preprint)

AI Epistemic Risks: Emerging Mechanisms &amp; Evidence

Mick Yang

Stephen Casper

Jonathan Stray

Jasmine Li

Cameron Jones

Anna Gausen

Natasha Jacques

Brian Christian

Bálint Gyevnár

Hannah Rose Kirk

ZHONGHAO HE

Dan Zhao (285025)

Siao Si Looi

J. Levy

Kobi Hackenburg

Elizabeth Seger

Matt Kowal

Michelle Malonza

Luke Hewitt

Hause Lin … (see 10 more)

Maarten Sap

Dylan Hadfield-Menell

Thomas Costello

David Rand

Atoosa Kasirzadeh

Gordon Pennycook

Yoshua Bengio

2025-12-31

SSRN Electronic Journal (accepted)

Position: Time to Close The Validation Gap in LLM Social Simulations

Aurélien Bück-Kaeffer

LLM-based social simulations—in which many language model agents interact over multiple turns—are rapidly proliferating across policy an… (see more)alysis, epidemiology, and computational social science. Yet the field lacks consensus on how to validate these simulations, with evaluation methods that are sparse, inconsistent, and rarely shared across disciplinary silos. We argue this creates a serious risk: premature deployment of unvalidated simulators in high-stakes domains. Our position is that the field must pivot from expansion to consolidation, prioritizing methodological standardization—shared benchmarks, open data, and reproducible evaluation protocols grounded in social science and complex systems research. We outline a concrete research program organized around specific learning problems/benchmarks, providing a path toward answering the fundamental question: when are LLM social simulations useful modelling objects?

2025-12-31

International Conference on Machine Learning (Accept (regular))

openreview.net

Deepfakes in the 2025 Canadian Election: Prevalence, Partisanship, and Platform Dynamics

Victor Livernoche

Andreea Musulan

Concerns about AI-generated political content are growing, yet there is limited empirical evidence on how deepfakes actually appear and circ… (see more)ulate across social platforms during major events in democratic countries. In this study, we present one of the first in-depth analyses of how these realistic synthetic media shape the political landscape online, focusing specifically on the 2025 Canadian federal election. By analyzing 187,778 posts from X, Bluesky, and Reddit with a high-accuracy detection framework trained on a diverse set of modern generative models, we find that 5.86% of election-related images were deepfakes. Right-leaning accounts shared them more frequently, with 8.66% of their posted images flagged compared to 4.42% for left-leaning users, often with defamatory or conspiratorial intent. Yet, most detected deepfakes were benign or non-political, and harmful ones drew little attention, accounting for only 0.12% of all views on X. Overall, deepfakes were present in the election conversation, but their reach was modest, and realistic fabricated images, although less common, drew higher engagement, highlighting growing concerns about their potential misuse.

2025-12-14

arXiv (preprint)

$\texttt{BluePrint}$: A Social Media User Dataset for LLM Persona Evaluation and Training

Aur'elien Buck-Kaeffer

Je Qin Chooi

Dan Zhao

Large language models (LLMs) offer promising capabilities for simulating social media dynamics at scale, enabling studies that would be ethi… (see more)cally or logistically challenging with human subjects. However, the field lacks standardized data resources for fine-tuning and evaluating LLMs as realistic social media agents. We address this gap by introducing SIMPACT, the SIMulation-oriented Persona and Action Capture Toolkit, a privacy respecting framework for constructing behaviorally-grounded social media datasets suitable for training agent models. We formulate next-action prediction as a task for training and evaluating LLM-based agents and introduce metrics at both the cluster and population levels to assess behavioral fidelity and stylistic realism. As a concrete implementation, we release BluePrint, a large-scale dataset built from public Bluesky data focused on political discourse. BluePrint clusters anonymized users into personas of aggregated behaviours, capturing authentic engagement patterns while safeguarding privacy through pseudonymization and removal of personally identifiable information. The dataset includes a sizable action set of 12 social media interaction types (likes, replies, reposts, etc.), each instance tied to the posting activity preceding it. This supports the development of agents that use context-dependence, not only in the language, but also in the interaction behaviours of social media to model social media users. By standardizing data and evaluation protocols, SIMPACT provides a foundation for advancing rigorous, ethically responsible social media simulations. BluePrint serves as both an evaluation benchmark for political discourse modeling and a template for building domain specific datasets to study challenges such as misinformation and polarization.

2025-09-26

ArXiv (preprint)

CrediBench: Building Web-Scale Network Datasets for Information Integrity

James Zhou

Michael M. Bronstein

Shenyang Huang

Online misinformation poses an escalating threat, amplified by the Internet's open nature and increasingly capable LLMs that generate persua… (see more)sive yet deceptive content. Existing misinformation detection methods typically focus on either textual content or network structure in isolation, failing to leverage the rich, dynamic interplay between website content and hyperlink relationships that characterizes real-world misinformation ecosystems. We introduce CrediBench: a large-scale data processing pipeline for constructing temporal web graphs that jointly model textual content and hyperlink structure for misinformation detection. Unlike prior work, our approach captures the dynamic evolution of general misinformation domains, including changes in both content and inter-site references over time. Our processed one-month snapshot extracted from the Common Crawl archive in December 2024 contains 45 million nodes and 1 billion edges, representing the largest web graph dataset made publicly available for misinformation research to date. From our experiments on this graph snapshot, we demonstrate the strength of both structural and webpage content signals for learning credibility scores, which measure source reliability. The pipeline and experimentation code are all available here, and the dataset is in this folder.

2025-09-21

NPGML @ Neural Information Processing Systems (poster)

openreview.net

SandboxSocial: A Sandbox for Social Media Using Multimodal AI Agents

Gayatri Krishnakumar

Busra Tugce Gurbuz

Austin Welch

Andreea Musulan

Hao Yu

Ethan Kosak-Hine

Tom Gibbs

Camille Thibault

Dan Zhao

The online information ecosystem enables influence campaigns of unprecedented scale and impact. We urgently need empirically grounded approa… (see more)ches to counter the growing threat of malicious campaigns, now amplified by generative AI. But, developing defenses in real-world settings is impractical. Social system simulations with agents modelled using Large Language Models (LLMs) are a promising alternative approach and a growing area of research. However, existing simulators lack features needed to capture the complex information-sharing dynamics of platform-based social networks. To bridge this gap, we present SandboxSocial, a new simulator that includes several key innovations, mainly: (1) a virtual social media platform (modelled as Mastodon and mirrored in an actual Mastodon server) that enables a realistic setting in which agents interact; (2) an adapter that uses real-world user data to create more grounded agents and social media content; and (3) multi-modal capabilities that enable our agents to interact using both text and images---just as humans do on social media. We make the simulator more useful to researchers by providing measurement and analysis tools that track simulation dynamics and compute evaluation metrics to compare experimental results.

2025-08-15

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (published)

Veracity: An Open-Source AI Fact-Checking System

Taylor Lynn Curtis

William Garneau

Manon Gruaz

Mike Pinder

Li Wei Wang

Sukanya Krishna

Luda Cohen

The proliferation of misinformation poses a significant threat to society, exacerbated by the capabilities of generative AI. This demo paper… (see more) introduces Veracity, an open-source AI system designed to empower individuals to combat misinformation through transparent and accessible fact-checking. Veracity leverages the synergy between Large Language Models (LLMs) and web retrieval agents to analyze user-submitted claims and provide grounded veracity assessments with intuitive explanations. Key features include multilingual support, numerical scoring of claim veracity, and an interactive interface inspired by familiar messaging applications. This paper will showcase Veracity's ability to not only detect misinformation but also explain its reasoning, fostering media literacy and promoting a more informed society.

2025-08-15

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (published)