Publications

Intersecting perspectives: A participatory street review framework for urban inclusivity

Rashid A. Mushkani

Shin (Alexandre) Koseki

2025-10-01

Habitat International (publié)

doi.org

Intersecting perspectives: A participatory street review framework for urban inclusivity

Rashid A. Mushkani

Shin (Alexandre) Koseki

2025-10-01

Habitat International (publié)

doi.org

Intersecting perspectives: A participatory street review framework for urban inclusivity

Rashid A. Mushkani

Shin (Alexandre) Koseki

2025-10-01

Habitat International (publié)

doi.org

The Three Regimes of Offline-to-Online Reinforcement Learning

Li Li

Tianwei Ni

Yihao Sun

Pierre-Luc Bacon

Offline-to-online reinforcement learning (RL) has emerged as a practical paradigm that leverages offline datasets for pretraining and online… (voir plus) interactions for fine-tuning. However, its empirical behavior is highly inconsistent: design choices of online-fine tuning that work well in one setting can fail completely in another. We propose a stability--plasticity principle that can explain this inconsistency: we should preserve the knowledge of pretrained policy or offline dataset during online fine-tuning, whichever is better, while maintaining sufficient plasticity. This perspective identifies three regimes of online fine-tuning, each requiring distinct stability properties. We validate this framework through a large-scale empirical study, finding that the results strongly align with its predictions in 45 of 63 cases. This work provides a principled framework for guiding design choices in offline-to-online RL based on the relative performance of the offline dataset and the pretrained policy.

2025-10-01

ArXiv (prépublication)

arxiv.org

They Hear Me Rolling: Design and Characterization of a Distributed, Rolling Acoustic-Tactile Sensor

Wilfred Mason

David Brenken

Olivier St-Martin Cormier

Audrey Sedal

Tactile sensor design has been widely explored at the centimeter-scale; fewer explorations exist in larger scale systems with varied geometr… (voir plus)ies. We present a meter-scale tactile sensor for wheeled robotic platforms based on a flexible acoustic waveguide. This sensor architecture performs contact sensing over the surface of a rotating wheel with a single transducer that is separated from the sensing surface. The design and characterization of the sensor are presented, along with a demonstration of a state-estimation framework using tactile sensor feedback to measure surface features.

2025-10-01

IEEE Sensors Letters (publié)

doi.org

VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors

Atif Belal

Heitor Rapela Medeiros

Marco Pedersoli

Eric Granger

Vision-language object detectors (VLODs) such as YOLO-World and Grounding DINO achieve impressive zero-shot recognition by aligning region p… (voir plus)roposals with text representations. However, their performance often degrades under domain shift. We introduce VLOD-TTA, a test-time adaptation (TTA) framework for VLODs that leverages dense proposal overlap and image-conditioned prompt scores. First, an IoU-weighted entropy objective is proposed that concentrates adaptation on spatially coherent proposal clusters and reduces confirmation bias from isolated boxes. Second, image-conditioned prompt selection is introduced, which ranks prompts by image-level compatibility and fuses the most informative prompts with the detector logits. Our benchmarking across diverse distribution shifts -- including stylized domains, driving scenes, low-light conditions, and common corruptions -- shows the effectiveness of our method on two state-of-the-art VLODs, YOLO-World and Grounding DINO, with consistent improvements over the zero-shot and TTA baselines. Code : https://github.com/imatif17/VLOD-TTA

2025-10-01

ArXiv (prépublication)

arxiv.org

DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code

Vahid Majdinasab

Amin Nikanjam

Foutse Khomh

2025-09-30

Empirical Software Engineering (publié)

doi.org

DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code

Vahid Majdinasab

Amin Nikanjam

Foutse Khomh

2025-09-30

Empirical Software Engineering (publié)

doi.org

DRBench: A Realistic Benchmark for Enterprise Deep Research

Amirhossein Abaskohi

Tianyi Chen

Miguel Muñoz-Mármol

Curtis Fox

Amrutha Varshini Ramesh

Étienne Marcotte

Issam Hadj Laradji

We introduce DRBench, a benchmark for evaluating AI agents on complex, open-ended deep research tasks in enterprise settings. Unlike prior b… (voir plus)enchmarks that focus on simple questions or web-only queries, DRBench evaluates agents on multi-step queries (for example, ``What changes should we make to our product roadmap to ensure compliance with this standard?") that require identifying supporting facts from both the public web and private company knowledge base. Each task is grounded in realistic user personas and enterprise context, spanning a heterogeneous search space that includes productivity software, cloud file systems, emails, chat conversations, and the open web. Tasks are generated through a carefully designed synthesis pipeline with human-in-the-loop verification, and agents are evaluated on their ability to recall relevant insights, maintain factual accuracy, and produce coherent, well-structured reports. We release 15 deep research tasks across 10 domains, such as Sales, Cybersecurity, and Compliance. We demonstrate the effectiveness of DRBench by evaluating diverse DR agents across open- and closed-source models (such as GPT, Llama, and Qwen) and DR strategies, highlighting their strengths, weaknesses, and the critical path for advancing enterprise deep research. Code is available at https://github.com/ServiceNow/drbench.

2025-09-30

ArXiv (prépublication)

arxiv.org

GRPO-$\lambda$: Credit Assignment improves LLM Reasoning

Prasanna Parthasarathi

Mathieu Reymond

Boxing Chen

Yufei Cui

Sarath Chandar

Large language models (LLMs) are increasingly deployed for tasks requiring complex reasoning, prompting significant interest in improving th… (voir plus)eir reasoning abilities through post-training. Especially RL based methods using verifiable reward, like the state-of-the-art GRPO, have shown to tremendously improve reasoning behaviors when applied as post-training methods. However, the lack of an explicit reward or critic model limits GRPO's ability to assign fine-grained credit across token sequences. In this work, we present GRPO-

2025-09-30

ArXiv (prépublication)

arxiv.org

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Johan Samir Obando Ceron

Yoshua Bengio

Brian R. Bartoldson

Bhavya Kailkhura

Guillaume Lajoie

Glen Berseth

Nikolay Malkin

Moksh J. Jain

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (voir plus) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

2025-09-30

ArXiv (prépublication)

arxiv.org

Asymmetric developmental bifurcations in polarized environments: a new class of human variants, which may include autism.

Laurent Mottron

Alix Lavigne-Champagne

Boris C. Bernhardt

Guillaume Dumas

Sébastien Jacquemont

D. Gagnon

2025-09-29

Molecular Psychiatry (publié)

doi.org

Hugo Larochelle nommé directeur scientifique de Mila

Programme d’apprentissage IA sur mesure

Mil'Haq Fest 2025

Communauté de pratique de Mila

Perspectives sur l’IA pour les responsables des politiques

Publications

Hugo Larochelle nommé directeur scientifique de Mila

Programme d’apprentissage IA sur mesure

Mil'Haq Fest 2025

Communauté de pratique de Mila

Perspectives sur l’IA pour les responsables des politiques

Mots-clés populaires:

Publications