Portrait de Jean-François Godbout

Jean-François Godbout

Membre académique associé
Professeur titulaire, Université de Montréal
Sujets de recherche
Désinformation
Modèles génératifs
Sécurité de l'IA

Biographie

Jean-François Godbout est professeur au département de science politique de l'Université de Montréal et membre académique associé à Mila - Institut québécois d'intelligence artificielle. Ses recherches portent principalement sur les sciences sociales computationnelles, la sécurité de l'IA et l'impact de l'IA générative sur la société. Il est actuellement directeur du programme de premier cycle en analyse des données en sciences humaines à l'Université de Montréal et chercheur à IVADO.

Étudiants actuels

Postdoctorat - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Co-superviseur⋅e :

Publications

Veracity: An Open-Source AI Fact-Checking System
Taylor Lynn Curtis
Maximilian Puelma Touzel
William Garneau
Manon Gruaz
Mike Pinder
Li Wei Wang
Sukanya Krishna
Luda Cohen
Kellin Pelrine
The proliferation of misinformation poses a significant threat to society, exacerbated by the capabilities of generative AI. This demo paper… (voir plus) introduces Veracity, an open-source AI system designed to empower individuals to combat misinformation through transparent and accessible fact-checking. Veracity leverages the synergy between Large Language Models (LLMs) and web retrieval agents to analyze user-submitted claims and provide grounded veracity assessments with intuitive explanations. Key features include multilingual support, numerical scoring of claim veracity, and an interactive interface inspired by familiar messaging applications. This paper will showcase Veracity's ability to not only detect misinformation but also explain its reasoning, fostering media literacy and promoting a more informed society.
It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics
Matthew Kowal
Jasper Timm
Thomas H Costello
Antonio A. Arechar
Gordon Pennycook
David Rand
Adam Gleave
Kellin Pelrine
Persuasion is a powerful capability of large language models (LLMs) that both enables beneficial applications (e.g. helping people quit smok… (voir plus)ing) and raises significant risks (e.g. large-scale, targeted political manipulation). Prior work has found models possess a significant and growing persuasive capability, measured by belief changes in simulated or real users. However, these benchmarks overlook a crucial risk factor: the propensity of a model to attempt to persuade in harmful contexts. Understanding whether a model will blindly ``follow orders'' to persuade on harmful topics (e.g. glorifying joining a terrorist group) is key to understanding the efficacy of safety guardrails. Moreover, understanding if and when a model will engage in persuasive behavior in pursuit of some goal is essential to understanding the risks from agentic AI systems. We propose the Attempt to Persuade Eval (APE) benchmark, that shifts the focus from persuasion success to persuasion attempts, operationalized as a model's willingness to generate content aimed at shaping beliefs or behavior. Our evaluation framework probes frontier LLMs using a multi-turn conversational setup between simulated persuader and persuadee agents. APE explores a diverse spectrum of topics including conspiracies, controversial issues, and non-controversially harmful content. We introduce an automated evaluator model to identify willingness to persuade and measure the frequency and context of persuasive attempts. We find that many open and closed-weight models are frequently willing to attempt persuasion on harmful topics and that jailbreaking can increase willingness to engage in such behavior. Our results highlight gaps in current safety guardrails and underscore the importance of evaluating willingness to persuade as a key dimension of LLM risk. APE is available at github.com/AlignmentResearch/AttemptPersuadeEval
It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics
Matthew Kowal
Jasper Timm
Thomas H Costello
Antonio A. Arechar
Gordon Pennycook
David Rand
Adam Gleave
Kellin Pelrine
Persuasion is a powerful capability of large language models (LLMs) that both enables beneficial applications (e.g. helping people quit smok… (voir plus)ing) and raises significant risks (e.g. large-scale, targeted political manipulation). Prior work has found models possess a significant and growing persuasive capability, measured by belief changes in simulated or real users. However, these benchmarks overlook a crucial risk factor: the propensity of a model to attempt to persuade in harmful contexts. Understanding whether a model will blindly ``follow orders'' to persuade on harmful topics (e.g. glorifying joining a terrorist group) is key to understanding the efficacy of safety guardrails. Moreover, understanding if and when a model will engage in persuasive behavior in pursuit of some goal is essential to understanding the risks from agentic AI systems. We propose the Attempt to Persuade Eval (APE) benchmark, that shifts the focus from persuasion success to persuasion attempts, operationalized as a model's willingness to generate content aimed at shaping beliefs or behavior. Our evaluation framework probes frontier LLMs using a multi-turn conversational setup between simulated persuader and persuadee agents. APE explores a diverse spectrum of topics including conspiracies, controversial issues, and non-controversially harmful content. We introduce an automated evaluator model to identify willingness to persuade and measure the frequency and context of persuasive attempts. We find that many open and closed-weight models are frequently willing to attempt persuasion on harmful topics and that jailbreaking can increase willingness to engage in such behavior. Our results highlight gaps in current safety guardrails and underscore the importance of evaluating willingness to persuade as a key dimension of LLM risk. APE is available at github.com/AlignmentResearch/AttemptPersuadeEval
Veracity: An Open-Source AI Fact-Checking System
Taylor Lynn Curtis
Maximilian Puelma Touzel
William Garneau
Manon Gruaz
Mike Pinder
Li Wei Wang
Sukanya Krishna
Luda Cohen
Kellin Pelrine
The proliferation of misinformation poses a significant threat to society, exacerbated by the capabilities of generative AI. This demo paper… (voir plus) introduces Veracity, an open-source AI system designed to empower individuals to combat misinformation through transparent and accessible fact-checking. Veracity leverages the synergy between Large Language Models (LLMs) and web retrieval agents to analyze user-submitted claims and provide grounded veracity assessments with intuitive explanations. Key features include multilingual support, numerical scoring of claim veracity, and an interactive interface inspired by familiar messaging applications. This paper will showcase Veracity's ability to not only detect misinformation but also explain its reasoning, fostering media literacy and promoting a more informed society.
From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions
Ruben Weijers
Denton Wu
Hannah Betts
Tamara Jacod
Yuxiang Guan
Vidya Sujaya
Kushal Dev
Toshali Goel
William Delooze
Ying Wu
Kellin Pelrine
Generative AI has the potential to transform personalization and accessibility of education. However, it raises serious concerns about accur… (voir plus)acy and helping students become independent critical thinkers. In this study, we designed a helpful yet fallible AI "Peer" to help students correct fundamental physics misconceptions related to Newtonian mechanic concepts. In contrast to approaches that seek near-perfect accuracy to create an authoritative AI tutor or teacher, we directly inform students that this AI can answer up to 40\% of questions incorrectly. In a randomized controlled trial with 165 students, those who engaged in targeted dialogue with the AI Peer achieved post-test scores that were, on average, 10.5 percentage points higher—with over 20 percentage points higher normalized gain—than a control group that discussed physics history. Qualitative feedback indicated that 91% of the treatment group's AI interactions were rated as helpful. Furthermore, by comparing student performance on pre- and post-test questions about the same concept, along with experts' annotations of the AI interactions, we find initial evidence suggesting the improvement in performance does not depend on the correctness of the AI. With further research, the AI Peer paradigm described here could open new possibilities for how we learn, adapt to, and grow with AI.
A Guide to Misinformation Detection Data and Evaluation
Camille Thibault
Jacob-Junqi Tian
Gabrielle Péloquin-Skulski
Taylor Lynn Curtis
James Zhou
Florence Laflamme
Yuxiang Guan
Kellin Pelrine
Misinformation is a complex societal issue, and mitigating solutions are difficult to create due to data deficiencies. To address this probl… (voir plus)em, we have curated the largest collection of (mis)information datasets in the literature, totaling 75. From these, we evaluated the quality of all of the 36 datasets that consist of statements or claims, as well as the 9 datasets that consists of data in purely paragraph form. We assess these datasets to identify those with solid foundations for empirical work and those with flaws that could result in misleading and non-generalizable results, such as insufficient label quality, spurious correlations. We further provide state-of-the-art baselines on all these datasets, but show that regardless of label quality, categorical labels may no longer give an accurate evaluation of detection model performance. We discuss alternatives to mitigate this problem. Overall, this guide aims to provide a roadmap for obtaining higher quality data and conducting more effective evaluations, ultimately improving research in misinformation detection. All datasets and other artifacts are available at [anonymized].
Online Influence Campaigns: Strategies and Vulnerabilities
Andreea Musulan
Veronica Xia
Ethan Kosak-Hine
Tom Gibbs
Vidya Sujaya
Kellin Pelrine
U. Montr'eal
Ivado
McGill University
In order to combat the creation and spread of harmful content online, this paper defines and contextualizes the concept of inauthentic, soci… (voir plus)etal-scale manipulation by malicious actors. We review the literature on societally harmful content and how it proliferates to analyze the manipulation strategies used by such actors and the vulnerabilities they target. We also provide an overview of three case studies of extensive manipulation campaigns to emphasize the severity of the problem. We then address the role that Artificial Intelligence plays in the development and dissemination of harmful content, and how its evolution presents new threats to societal cohesion for countries across the globe. Our survey aims to increase our understanding of not just particular aspects of these threats, but also the strategies underlying their deployment, so we can effectively prepare for the evolving cybersecurity landscape.
A Guide to Misinformation Detection Data and Evaluation
Camille Thibault
Jacob-Junqi Tian
Gabrielle Péloquin-Skulski
Taylor Lynn Curtis
James Zhou
Florence Laflamme
Yuxiang Guan
Kellin Pelrine
A Simulation System Towards Solving Societal-Scale Manipulation
Maximilian Puelma Touzel
Sneheel Sarangi
Austin Welch
Gayatri Krishnakumar
Dan Zhao
Zachary Yang
Hao Yu
Ethan Kosak-Hine
Tom Gibbs
Andreea Musulan
Camille Thibault
Busra Tugce Gurbuz
Kellin Pelrine
The rise of AI-driven manipulation poses significant risks to societal trust and democratic processes. Yet, studying these effects in real-w… (voir plus)orld settings at scale is ethically and logistically impractical, highlighting a need for simulation tools that can model these dynamics in controlled settings to enable experimentation with possible defenses. We present a simulation environment designed to address this. We elaborate upon the Concordia framework that simulates offline, `real life' activity by adding online interactions to the simulation through social media with the integration of a Mastodon server. We improve simulation efficiency and information flow, and add a set of measurement tools, particularly longitudinal surveys. We demonstrate the simulator with a tailored example in which we track agents' political positions and show how partisan manipulation of agents can affect election results.
A Simulation System Towards Solving Societal-Scale Manipulation
Maximilian Puelma Touzel
Sneheel Sarangi
Austin Welch
Gayatri K
Dan Zhao
Zachary Yang
Hao Yu
Ethan Kosak-Hine
Tom Gibbs
Andreea Musulan
Camille Thibault
Busra Tugce Gurbuz
Kellin Pelrine
The rise of AI-driven manipulation poses significant risks to societal trust and democratic processes. Yet, studying these effects in real-w… (voir plus)orld settings at scale is ethically and logistically impractical, highlighting a need for simulation tools that can model these dynamics in controlled settings to enable experimentation with possible defenses. We present a simulation environment designed to address this. We elaborate upon the Concordia framework that simulates offline, `real life' activity by adding online interactions to the simulation through social media with the integration of a Mastodon server. We improve simulation efficiency and information flow, and add a set of measurement tools, particularly longitudinal surveys. We demonstrate the simulator with a tailored example in which we track agents' political positions and show how partisan manipulation of agents can affect election results.
Epistemic Integrity in Large Language Models
Bijean Ghafouri
Shahrad Mohammadzadeh
James Zhou
Pratheeksha Nair
Jacob-Junqi Tian
Mayank Goel
Kellin Pelrine
Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statem… (voir plus)ents with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration—where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty Large Language Models hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing and correcting this miscalibration, offering a path to safer and more trustworthy AI across domains.
Epistemic Integrity in Large Language Models
Bijean Ghafouri
Shahrad Mohammadzadeh
James Zhou
Pratheeksha Nair
Jacob-Junqi Tian
Mayank Goel
Kellin Pelrine
Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statem… (voir plus)ents with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration—where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty Large Language Models hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing and correcting this miscalibration, offering a path to safer and more trustworthy AI across domains.