Publications

Two-point deterministic equivalence for SGD in random feature models

Alexander Atanasov

Blake Bordelon

Jacob A Zavatone-Veth

Courtney Paquette

Cengiz Pehlevan

2025-06-09

ICML.cc/2025/Workshop/HiLD (poster)

openreview.net

Ultrasound and MRI-based evaluation of relationships between morphological and mechanical properties of the lower lumbar multifidus muscle in chronic low back pain.

Neda Naghdi

Sara Masi

Cléo Bertrand

Brent Rosenstein

Julien Cohen-Adad

Hassan Rivaz

Mathieu Roy

Maryse Fortin

2025-06-09

European Spine Journal (publié)

doi.org

How to Train Your LLM Web Agent: A Statistical Diagnosis

Dheeraj Vattikonda

Santhoshi Ravichandran

Emiliano Penaloza

Hadi Nekoei

Megh Thakkar

Thibault Le Sellier De Chezelles

Nicolas Gontier

Miguel Muñoz-Mármol

Stefania Raimondo

Alexandre Lacoste

Massimo Caccia

Large language model (LLM) agents for web interfaces have advanced rapidly, yet open-source systems still lag behind proprietary agents. Bri… (voir plus)dging this gap is key to enabling customizable, efficient, and privacy-preserving agents. Two challenges hinder progress: the reproducibility issues in RL and LLM agent training, where results often depend on sensitive factors like seeds and decoding parameters, and the focus of prior work on single-step tasks, overlooking the complexities of web-based, multi-step decision-making. We address these gaps by providing a statistically driven study of training LLM agents for web tasks. Our two-stage pipeline combines imitation learning from a Llama 3.3 70B teacher with on-policy fine-tuning via Group Relative Policy Optimization (GRPO) on a Llama 3.1 8B student. Through 240 configuration sweeps and rigorous bootstrapping, we chart the first compute allocation curve for open-source LLM web agents. Our findings show that dedicating one-third of compute to teacher traces and the rest to RL improves MiniWoB++ success by 6 points and closes 60% of the gap to GPT-4o on WorkArena, while cutting GPU costs by 45%. We introduce a principled hyperparameter sensitivity analysis, offering actionable guidelines for robust and cost-effective agent training.

2025-06-08

ICML.cc/2025/Workshop/WCUA (présentation orale)

openreview.net

How to Train Your LLM Web Agent: A Statistical Diagnosis

Dheeraj Vattikonda

Santhoshi Ravichandran

Emiliano Penaloza

Hadi Nekoei

Megh Thakkar

Thibault Le Sellier De Chezelles

Nicolas Gontier

Miguel Muñoz-Mármol

Stefania Raimondo

Alexandre Lacoste

Massimo Caccia

LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with op… (voir plus)en-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents. To address this, we present the first statistically grounded study on compute allocation for LLM web-agent post-training. Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT), followed by on-policy reinforcement learning. We find this process highly sensitive to hyperparameter choices, making exhaustive sweeps impractical. To spare others from expensive trial-and-error, we sample 1,370 configurations and use bootstrapping to estimate effective hyperparameters. Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++. Further, this strategy requires only 55% of the compute to match the peak performance of pure SFT on MiniWob++, effectively pushing the compute-performance Pareto frontier, and is the only strategy that can close the gap with closed-source models.

2025-06-08

ICML.cc/2025/Workshop/WCUA (présentation orale)

doi.org

openreview.net

Multi-Priority Scheduling for Traffic Management in Future Scalable Payloads.

Zineb Garroussi

Olfa Ben Yahia

Brunilde Sansò

Jean-François Frigon

Stéphane Martel

Guillaume Mantelet

Antoine Lesage-Landry

Gunes Karabulut Kurt

2025-06-08

2025 IEEE International Conference on Communications Workshops (ICC Workshops) (publié)

doi.org

Multi-Priority Scheduling for Traffic Management in Future Scalable Payloads

Zineb Garroussi

Olfa Ben Yahia

Brunilde Sansò

Jean-François Frigon

Stéphane Martel

Guillaume Mantelet

Antoine Lesage-Landry

Gunes Karabulut Kurt

Through multibeam, frequency reuse, and advanced antenna technology, regenerative non-geostationary orbit (NGSO) extremely high-throughput s… (voir plus)atellites (EHTS) are expected to play a key role in future communications, delivering data rates up to terabits per second. This paper investigates a novel architecture for future regenerative and scalable payloads to satisfy users’ demands for varying quality of service (QoS). This architecture is designed based on multiple modem banks and requires a new flow assignment strategy to efficiently route traffic within the satellite. We propose a multi-commodity path flow optimization problem to manage the load with varying QoS requirements across multiple modems within an NGSO high-throughput satellite (HTS) system and beyond. The simulation results demonstrate that the proposed model consistently maintains low delays and packet losses for the highest-priority traffic and outperforms the classical first-in, first-out (FIFO) approach.

2025-06-08

2025 IEEE International Conference on Communications Workshops (ICC Workshops) (publié)

doi.org

Multi-Priority Scheduling for Traffic Management in Future Scalable Payloads

Zineb Garroussi

Olfa Ben Yahia

Brunilde Sansò

Jean-François Frigon

Stéphane Martel

Guillaume Mantelet

Antoine Lesage-Landry

Gunes Karabulut Kurt

Through multibeam, frequency reuse, and advanced antenna technology, regenerative non-geostationary orbit (NGSO) extremely high-throughput s… (voir plus)atellites (EHTS) are expected to play a key role in future communications, delivering data rates up to terabits per second. This paper investigates a novel architecture for future regenerative and scalable payloads to satisfy users’ demands for varying quality of service (QoS). This architecture is designed based on multiple modem banks and requires a new flow assignment strategy to efficiently route traffic within the satellite. We propose a multi-commodity path flow optimization problem to manage the load with varying QoS requirements across multiple modems within an NGSO high-throughput satellite (HTS) system and beyond. The simulation results demonstrate that the proposed model consistently maintains low delays and packet losses for the highest-priority traffic and outperforms the classical first-in, first-out (FIFO) approach.

2025-06-08

2025 IEEE International Conference on Communications Workshops (ICC Workshops) (publié)

doi.org

Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning

Léo Boisvert

Abhay Puri

Chandra Kiran Reddy Evuru

Joshua Kazdan

Avinandan Bose

Quentin Cappart

Maryam Fazel

Sai Rajeswar

Jason Stanley

Nicolas Chapados

Alexandre Drouin

Krishnamurthy Dj Dvijotham

The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest in imp… (voir plus)roving these capabilities by explicitly fine-tuning the LLMs/VLMs that power these agents. Several researchers have proposed collecting data by letting the agents interact with their environment (e.g., a computer operating system, the web or a collection of APIs exposed as tools), and improve agent performance by fine tuning on this data. In this work, we show that such data collection can be manipulated by adversaries to insert poisoned traces. By modifying just 5% of collected traces, adversaries can embed stealthy bad behaviors into agents—like leaking confidential user information whenever the tool or webpage exposes a trigger. Our results raise important security concerns in the development of AI agents, and underscore the importance of careful scrutiny of all data collection processes used to improve agentic AI.

2025-06-08

ICML.cc/2025/Workshop/WCUA (poster)

openreview.net

State Entropy Regularization for Robust Reinforcement Learning

Uri Koren

Yonatan Ashlag

Mirco Mutti

Esther Derman

Pierre-Luc Bacon

Shie Mannor

State entropy regularization has empirically shown better exploration and sample complexity in reinforcement learning (RL). However, its the… (voir plus)oretical guarantees have not been studied. In this paper, we show that state entropy regularization improves robustness to structured and spatially correlated perturbations. These types of variation are common in transfer learning but often overlooked by standard robust RL methods, which typically focus on small, uncorrelated changes. We provide a comprehensive characterization of these robustness properties, including formal guarantees under reward and transition uncertainty, as well as settings where the method performs poorly. Much of our analysis contrasts state entropy with the widely used policy entropy regularization, highlighting their different benefits. Finally, from a practical standpoint, we illustrate that compared with policy entropy, the robustness advantages of state entropy are more sensitive to the number of rollouts used for policy evaluation.

2025-06-08

ArXiv (prépublication)

arxiv.org

Boosting LLM Reasoning via Spontaneous Self-Correction

Xutong Zhao

Tengyu Xu

Xuewei Wang

Zhengxing Chen

Di Jin

Liang Tan

Zishun Yu

Zhuokai Zhao

Yun He

Si-Yuan Wang

Han Fang

Sarath Chandar

Chen Zhu

MetaAI

Mila - Québec

AI Institute

Polytechnique Montréal

While large language models (LLMs) have demonstrated remarkable success on a broad range of tasks, math reasoning remains a challenging one.… (voir plus) One of the approaches for improving math reasoning is self-correction, which designs self-improving loops to let the model correct its own mistakes. However, existing self-correction approaches treat corrections as standalone post-generation refinements, relying on extra prompt and system designs to elicit self-corrections, instead of performing real-time, spontaneous self-corrections in a single pass. To address this, we propose SPOC, a spontaneous self-correction approach that enables LLMs to generate interleaved solutions and verifications in a single inference pass, with generation dynamically terminated based on verification outcomes, thereby effectively scaling inference time compute. SPOC considers a multi-agent perspective by assigning dual roles -- solution proposer and verifier -- to the same model. We adopt a simple yet effective approach to generate synthetic data for fine-tuning, enabling the model to develop capabilities for self-verification and multi-agent collaboration. We further improve its solution proposal and verification accuracy through online reinforcement learning. Experiments on mathematical reasoning benchmarks show that SPOC significantly improves performance. Notably, SPOC boosts the accuracy of Llama-3.1-8B and 70B Instruct models, achieving gains of 8.8% and 11.6% on MATH500, 10.0% and 20.0% on AMC23, and 3.3% and 6.7% on AIME24, respectively.

2025-06-07

ArXiv (prépublication)

doi.org

arxiv.org

Boosting LLM Reasoning via Spontaneous Self-Correction

Xutong Zhao

Tengyu Xu

Xuewei Wang

Zhengxing Chen

Di Jin

Liang Tan

Zishun Yu

Zhuokai Zhao

Yun He

Sinong Wang

Han Fang

Sarath Chandar

Chen Zhu

MetaAI

Mila - Québec

AI Institute

Polytechnique Montréal

While large language models (LLMs) have demonstrated remarkable success on a broad range of tasks, math reasoning remains a challenging one.… (voir plus) One of the approaches for improving math reasoning is self-correction, which designs self-improving loops to let the model correct its own mistakes. However, existing self-correction approaches treat corrections as standalone post-generation refinements, relying on extra prompt and system designs to elicit self-corrections, instead of performing real-time, spontaneous self-corrections in a single pass. To address this, we propose SPOC, a spontaneous self-correction approach that enables LLMs to generate interleaved solutions and verifications in a single inference pass, with generation dynamically terminated based on verification outcomes, thereby effectively scaling inference time compute. SPOC considers a multi-agent perspective by assigning dual roles -- solution proposer and verifier -- to the same model. We adopt a simple yet effective approach to generate synthetic data for fine-tuning, enabling the model to develop capabilities for self-verification and multi-agent collaboration. We further improve its solution proposal and verification accuracy through online reinforcement learning. Experiments on mathematical reasoning benchmarks show that SPOC significantly improves performance. Notably, SPOC boosts the accuracy of Llama-3.1-8B and 70B Instruct models, achieving gains of 8.8% and 11.6% on MATH500, 10.0% and 20.0% on AMC23, and 3.3% and 6.7% on AIME24, respectively.

2025-06-07

ArXiv (prépublication)

arxiv.org

A Self-Supervised Foundation Model for Robust and Generalizable Representation Learning in STED Microscopy

Anthony Bilodeau

Frédéric Beaupré

Julia Chabbert

Jean-Michel Bellavance

Koraly Lessard

Andréanne Deschênes

Renaud Bernatchez

Paul De Koninck

Christian Gagné

Flavie Lavoie-Cardinal

2025-06-06

bioRxiv (prépublication)

doi.org

Programme d’apprentissage IA sur mesure

Mil'Haq Fest 2025

Communauté de pratique de Mila

Demandes de supervision

Publications

Programme d’apprentissage IA sur mesure

Mil'Haq Fest 2025

Communauté de pratique de Mila

Demandes de supervision

Mots-clés populaires:

Publications