Léo Boisvert

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Léo Boisvert

Abhay Puri

Chandra Kiran Reddy Evuru

Nazanin Sepahvand

Nicolas Chapados

Quentin Cappart

Alexandre Lacoste

Krishnamurthy (DJ) Dvijotham

Alexandre Drouin

The practice of fine-tuning AI agents on data from their own interactions--such as web browsing or tool use--, while being a strong general … (see more)recipe for improving agentic capabilities, also introduces a critical security vulnerability within the AI supply chain. In this work, we show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors that are triggerred by specific target phrases, such that when the agent encounters these triggers, it performs an unsafe or malicious action. We formalize and validate three realistic threat models targeting different layers of the supply chain: 1) direct poisoning of fine-tuning data, where an attacker controls a fraction of the training traces; 2) environmental poisoning, where malicious instructions are injected into webpages scraped or tools called while creating training data; and 3) supply chain poisoning, where a pre-backdoored base model is fine-tuned on clean data to improve its agentic capabilities. Our results are stark: by poisoning as few as 2% of the collected traces, an attacker can embed a backdoor causing an agent to leak confidential user information with over 80% success when a specific trigger is present. This vulnerability holds across all three threat models. Furthermore, we demonstrate that prominent safeguards, including two guardrail models and one weight-based defense, fail to detect or prevent the malicious behavior. These findings highlight an urgent threat to agentic AI development and underscore the critical need for rigorous security vetting of data collection processes and end-to-end model supply chains.

2025-10-02

arXiv (preprint)

doi.org

arxiv.org

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Léo Boisvert

Mihir Bansal

Chandra Kiran Reddy Evuru

Gabriel Huang

Abhay Puri

Avinandan Bose

Maryam Fazel

Quentin Cappart

Jason Stanley

Alexandre Lacoste

Alexandre Drouin

Krishnamurthy Dj Dvijotham

We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a plug-in framework… (see more) and integrates easily into realistic agentic frameworks like BrowserGym (for web agents) and

2025-07-06

colmweb.org/COLM/2025/Conference (accepted)

doi.org

openreview.net

Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning

Léo Boisvert

Abhay Puri

Chandra Kiran Reddy Evuru

Joshua Kazdan

Avinandan Bose

Quentin Cappart

Maryam Fazel

Sai Rajeswar

Jason Stanley

Nicolas Chapados

Alexandre Drouin

Krishnamurthy Dj Dvijotham

The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest in imp… (see more)roving these capabilities by explicitly fine-tuning the LLMs/VLMs that power these agents. Several researchers have proposed collecting data by letting the agents interact with their environment (e.g., a computer operating system, the web or a collection of APIs exposed as tools), and improve agent performance by fine tuning on this data. In this work, we show that such data collection can be manipulated by adversaries to insert poisoned traces. By modifying just 5% of collected traces, adversaries can embed stealthy bad behaviors into agents—like leaking confidential user information whenever the tool or webpage exposes a trigger. Our results raise important security concerns in the development of AI agents, and underscore the importance of careful scrutiny of all data collection processes used to improve agentic AI.

2025-06-07

ICML.cc/2025/Workshop/WCUA (poster)

openreview.net

The BrowserGym Ecosystem for Web Agent Research

Thibault Le Sellier De Chezelles

Maxime Gasse

Alexandre Lacoste

Massimo Caccia

Lawrence Keunho Jang

Ori Yoran

Dehan Kong

Frank F. Xu

Siva Reddy

Quentin Cappart

Graham Neubig

Ruslan Salakhutdinov

Nicolas Chapados

The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (see more)utomation and Large Language Models (LLMs). Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. In an earlier work, Drouin et al. (2024) introduced BrowserGym which aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature and includes AgentLab, a complementary framework that aids in agent creation, testing, and analysis. Our proposed ecosystem offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks made available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.

2024-12-31

Trans. Mach. Learn. Res. (published)

doi.org

openreview.net

Learning and fine-tuning a generic value-selection heuristic inside a constraint programming solver

Tom Marty

Léo Boisvert

Tristan François

Pierre Tessier

Louis Gautier

Louis-Martin Rousseau

Quentin Cappart

Constraint programming is known for being an efficient approach to solving combinatorial problems. Important design choices in a solver are … (see more)the branching heuristics, designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. Although several generic variable-selection heuristics are available in the literature, the options for value-selection heuristics are more scarce. We propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network. Experiments on graph coloring, maximum independent set, maximum cut, and minimum vertex cover problems show that this framework competes with the well-known impact-based and activity-based search heuristics and can find solutions close to optimality without requiring a large number of backtracks. Additionally, we observe that fine-tuning a model with a different problem class can accelerate the learning process.

2024-11-22

Constraints (published)

doi.org

Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think

Massimo Caccia

Megh Thakkar

Léo Boisvert

Thibault Le Sellier De Chezelles

Alexandre Piché

Nicolas Chapados

Alexandre Drouin

Maxime Gasse

Alexandre Lacoste

Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing digital … (see more)tasks through web interfaces in a human-like manner. However, even the strongest closed-source models often struggle to achieve robust results on several benchmarks, while a notable performance gap exists between them and open-source counterparts. This study investigates the potential of fine-tuning to enhance the performance of a smaller, lower-performing but cost-efficient LLM by leveraging successful traces from stronger LLMs, referred to as experts. We outline a comprehensive pipeline for data collection, filtering, and supervised fine-tuning and explore various behavior cloning parameters. Our experiments provide key insights into the challenges of fine-tuning LLMs into web agents on benchmarks like MiniWoB and WorkArena. Notably, we find that the fine-tuned agents' ability to predict expert trajectories does not consistently lead to improved downstream task performance. This raises issues such as off-policy bias and the loss of reasoning abilities during fine-tuning. We discuss potential solutions to these challenges and make both the codebase and a dataset of 140M tokens open-source for the community to build upon.

2024-10-21

NeurIPS.cc/2024/Workshop/OWA (poster)

openreview.net

AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents

Megh Thakkar

Léo Boisvert

Thibault Le Sellier De Chezelles

Alexandre Piché

Maxime Gasse

Alexandre Lacoste

Massimo Caccia

2024-10-09

NeurIPS.cc/2024/Workshop/AFM (poster)

openreview.net

WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

Léo Boisvert

Megh Thakkar

Maxime Gasse

Massimo Caccia

Thibault Le Sellier De Chezelles

Quentin Cappart

Nicolas Chapados

Alexandre Lacoste

Alexandre Drouin

The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though recen… (see more)t LLMs seem capable of planning and reasoning given user instructions, their effectiveness in applying these capabilities for autonomous task solving remains underexplored. This is especially true in enterprise settings, where automated agents hold the promise of a high impact. To fill this gap, we propose WorkArena++, a novel benchmark consisting of 682 tasks corresponding to realistic workflows routinely performed by knowledge workers. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents. Our empirical studies across state-of-the-art LLMs and vision-language models (VLMs), as well as human workers, reveal several challenges for such models to serve as useful assistants in the workplace. In addition to the benchmark, we provide a mechanism to effortlessly generate thousands of ground-truth observation/action traces, which can be used for fine-tuning existing models. Overall, we expect this work to serve as a useful resource to help the community progress toward capable autonomous agents. The benchmark can be found at https://github.com/ServiceNow/WorkArena.

2024-09-25

Datasets and Benchmarks Track @ Neural Information Processing Systems (poster)

doi.org

openreview.net

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?

Alexandre Drouin

Maxime Gasse

Massimo Caccia

Issam H. Laradji

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-07-22

International Conference on Machine Learning (Accept (Poster))

doi.org

proceedings.mlr.press

Towards a Generic Representation of Combinatorial Problems for Learning-Based Approaches

Léo Boisvert

Hélène Verhaeghe

Quentin Cappart

2024-05-24

Integration of Constraint Programming, Artificial Intelligence, and Operations Research (published)

doi.org

arxiv.org