Publications

Predicting space use patterns of a territorial top predator: from individual movement decisions to Arctic fox space use

Frédéric Dulude-de Broin

Dominique Berteaux

Joël Bêty

Catherine Villeneuve

Alexis Grenier-Potvin

Andréanne Beardsell

Jeanne Clermont

Audrey Durand

Pierre Legagneux

2025-10-01

bioRxiv (preprint)

doi.org

The Three Regimes of Offline-to-Online Reinforcement Learning

Li Li

Tianwei Ni

Yihao Sun

Pierre-Luc Bacon

Offline-to-online reinforcement learning (RL) has emerged as a practical paradigm that leverages offline datasets for pretraining and online… (see more) interactions for fine-tuning. However, its empirical behavior is highly inconsistent: design choices of online-fine tuning that work well in one setting can fail completely in another. We propose a stability--plasticity principle that can explain this inconsistency: we should preserve the knowledge of pretrained policy or offline dataset during online fine-tuning, whichever is better, while maintaining sufficient plasticity. This perspective identifies three regimes of online fine-tuning, each requiring distinct stability properties. We validate this framework through a large-scale empirical study, finding that the results strongly align with its predictions in 45 of 63 cases. This work provides a principled framework for guiding design choices in offline-to-online RL based on the relative performance of the offline dataset and the pretrained policy.

2025-10-01

ArXiv (preprint)

arxiv.org

The Three Regimes of Offline-to-Online Reinforcement Learning

Li Li

Tianwei Ni

Yihao Sun

Pierre-Luc Bacon

Offline-to-online reinforcement learning (RL) has emerged as a practical paradigm that leverages offline datasets for pretraining and online… (see more) interactions for fine-tuning. However, its empirical behavior is highly inconsistent: design choices of online-fine tuning that work well in one setting can fail completely in another. We propose a stability--plasticity principle that can explain this inconsistency: we should preserve the knowledge of pretrained policy or offline dataset during online fine-tuning, whichever is better, while maintaining sufficient plasticity. This perspective identifies three regimes of online fine-tuning, each requiring distinct stability properties. We validate this framework through a large-scale empirical study, finding that the results strongly align with its predictions in 45 of 63 cases. This work provides a principled framework for guiding design choices in offline-to-online RL based on the relative performance of the offline dataset and the pretrained policy.

2025-10-01

ArXiv (preprint)

arxiv.org

The Three Regimes of Offline-to-Online Reinforcement Learning

Li Li

Tianwei Ni

Yihao Sun

Pierre-Luc Bacon

Offline-to-online reinforcement learning (RL) has emerged as a practical paradigm that leverages offline datasets for pretraining and online… (see more) interactions for fine-tuning. However, its empirical behavior is highly inconsistent: design choices of online-fine tuning that work well in one setting can fail completely in another. We propose a stability--plasticity principle that can explain this inconsistency: we should preserve the knowledge of pretrained policy or offline dataset during online fine-tuning, whichever is better, while maintaining sufficient plasticity. This perspective identifies three regimes of online fine-tuning, each requiring distinct stability properties. We validate this framework through a large-scale empirical study, finding that the results strongly align with its predictions in 45 of 63 cases. This work provides a principled framework for guiding design choices in offline-to-online RL based on the relative performance of the offline dataset and the pretrained policy.

2025-10-01

ArXiv (preprint)

arxiv.org

They Hear Me Rolling: Design and Characterization of a Distributed, Rolling Acoustic-Tactile Sensor

Wilfred Mason

David Brenken

Olivier St-Martin Cormier

Audrey Sedal

Tactile sensor design has been widely explored at the centimeter-scale; fewer explorations exist in larger scale systems with varied geometr… (see more)ies. We present a meter-scale tactile sensor for wheeled robotic platforms based on a flexible acoustic waveguide. This sensor architecture performs contact sensing over the surface of a rotating wheel with a single transducer that is separated from the sensing surface. The design and characterization of the sensor are presented, along with a demonstration of a state-estimation framework using tactile sensor feedback to measure surface features.

2025-10-01

IEEE Sensors Letters (published)

doi.org

VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors

Atif Belal

Heitor Rapela Medeiros

Marco Pedersoli

Eric Granger

Vision-language object detectors (VLODs) such as YOLO-World and Grounding DINO achieve impressive zero-shot recognition by aligning region p… (see more)roposals with text representations. However, their performance often degrades under domain shift. We introduce VLOD-TTA, a test-time adaptation (TTA) framework for VLODs that leverages dense proposal overlap and image-conditioned prompt scores. First, an IoU-weighted entropy objective is proposed that concentrates adaptation on spatially coherent proposal clusters and reduces confirmation bias from isolated boxes. Second, image-conditioned prompt selection is introduced, which ranks prompts by image-level compatibility and fuses the most informative prompts with the detector logits. Our benchmarking across diverse distribution shifts -- including stylized domains, driving scenes, low-light conditions, and common corruptions -- shows the effectiveness of our method on two state-of-the-art VLODs, YOLO-World and Grounding DINO, with consistent improvements over the zero-shot and TTA baselines. Code : https://github.com/imatif17/VLOD-TTA

2025-10-01

ArXiv (preprint)

arxiv.org

VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors

Atif Belal

Heitor Rapela Medeiros

Marco Pedersoli

Eric Granger

2025-10-01

ArXiv (preprint)

arxiv.org

DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code

Vahid Majdinasab

Amin Nikanjam

Foutse Khomh

2025-09-30

Empirical Software Engineering (published)

doi.org

DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code

Vahid Majdinasab

Amin Nikanjam

Foutse Khomh

2025-09-30

Empirical Software Engineering (published)

doi.org

DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code

Vahid Majdinasab

Amin Nikanjam

Foutse Khomh

2025-09-30

Empirical Software Engineering (published)

doi.org

DRBench: A Realistic Benchmark for Enterprise Deep Research

Amirhossein Abaskohi

Tianyi Chen

Miguel Muñoz-Mármol

Curtis Fox

Amrutha Varshini Ramesh

Étienne Marcotte

Issam Hadj Laradji

We introduce DRBench, a benchmark for evaluating AI agents on complex, open-ended deep research tasks in enterprise settings. Unlike prior b… (see more)enchmarks that focus on simple questions or web-only queries, DRBench evaluates agents on multi-step queries (for example, ``What changes should we make to our product roadmap to ensure compliance with this standard?") that require identifying supporting facts from both the public web and private company knowledge base. Each task is grounded in realistic user personas and enterprise context, spanning a heterogeneous search space that includes productivity software, cloud file systems, emails, chat conversations, and the open web. Tasks are generated through a carefully designed synthesis pipeline with human-in-the-loop verification, and agents are evaluated on their ability to recall relevant insights, maintain factual accuracy, and produce coherent, well-structured reports. We release 15 deep research tasks across 10 domains, such as Sales, Cybersecurity, and Compliance. We demonstrate the effectiveness of DRBench by evaluating diverse DR agents across open- and closed-source models (such as GPT, Llama, and Qwen) and DR strategies, highlighting their strengths, weaknesses, and the critical path for advancing enterprise deep research. Code is available at https://github.com/ServiceNow/drbench.

2025-09-30

ArXiv (preprint)

arxiv.org

DRBench: A Realistic Benchmark for Enterprise Deep Research

Amirhossein Abaskohi

Tianyi Chen

Miguel Muñoz-Mármol

Curtis Fox

Amrutha Varshini Ramesh

Étienne Marcotte

Issam Hadj Laradji

We introduce DRBench, a benchmark for evaluating AI agents on complex, open-ended deep research tasks in enterprise settings. Unlike prior b… (see more)enchmarks that focus on simple questions or web-only queries, DRBench evaluates agents on multi-step queries (for example, ``What changes should we make to our product roadmap to ensure compliance with this standard?") that require identifying supporting facts from both the public web and private company knowledge base. Each task is grounded in realistic user personas and enterprise context, spanning a heterogeneous search space that includes productivity software, cloud file systems, emails, chat conversations, and the open web. Tasks are generated through a carefully designed synthesis pipeline with human-in-the-loop verification, and agents are evaluated on their ability to recall relevant insights, maintain factual accuracy, and produce coherent, well-structured reports. We release 15 deep research tasks across 10 domains, such as Sales, Cybersecurity, and Compliance. We demonstrate the effectiveness of DRBench by evaluating diverse DR agents across open- and closed-source models (such as GPT, Llama, and Qwen) and DR strategies, highlighting their strengths, weaknesses, and the critical path for advancing enterprise deep research. Code is available at https://github.com/ServiceNow/drbench.

2025-09-30

ArXiv (preprint)

arxiv.org

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Publications

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Popular keywords:

Publications