Publications

Discovering Temporal Structure: An Overview of Hierarchical Reinforcement Learning

Martin Klissarov

Akhil Bagaria

Ziyan Luo

George Konidaris

Marlos C. Machado

Developing agents capable of exploring, planning and learning in complex open-ended environments is a grand challenge in artificial intellig… (voir plus)ence (AI). Hierarchical reinforcement learning (HRL) offers a promising solution to this challenge by discovering and exploiting the temporal structure within a stream of experience. The strong appeal of the HRL framework has led to a rich and diverse body of literature attempting to discover a useful structure. However, it is still not clear how one might define what constitutes good structure in the first place, or the kind of problems in which identifying it may be helpful. This work aims to identify the benefits of HRL from the perspective of the fundamental challenges in decision-making, as well as highlight its impact on the performance trade-offs of AI agents. Through these benefits, we then cover the families of methods that discover temporal structure in HRL, ranging from learning directly from online experience to offline datasets, to leveraging large language models (LLMs). Finally, we highlight the challenges of temporal structure discovery and the domains that are particularly well-suited for such endeavours.

2025-06-16

ArXiv (prépublication)

doi.org

arxiv.org

Meta-learning how to Share Credit among Macro-Actions

Ionel-Alexandru Hosu

Traian Rebedea

Razvan Pascanu

2025-06-16

ArXiv (prépublication)

doi.org

arxiv.org

Can GPT4 Generate Effective Feedback on Code Readability?

Xiaotian Su

Yajie Song

Marcus Messer

Jaromir Savelka

Maria Cutumisu

April Wang

2025-06-13

Annual Conference on Innovation and Technology in Computer Science Education (publié)

doi.org

Detecting High-Stakes Interactions with Activation Probes

Alex McKenzie

Urja Pawar

Phil Blandfort

William Bankes

David Scott Krueger

Ekdeep Singh Lubana

Dmitrii Krasheninnikov

Monitoring is an important aspect of safely deploying Large Language Models (LLMs). This paper examines activation probes for detecting "hig… (voir plus)h-stakes" interactions -- where the text indicates that the interaction might lead to significant harm -- as a critical, yet underexplored, target for such monitoring. We evaluate several probe architectures trained on synthetic data, and find them to exhibit robust generalization to diverse, out-of-distribution, real-world data. Probes' performance is comparable to that of prompted or finetuned medium-sized LLM monitors, while offering computational savings of six orders-of-magnitude. Our experiments also highlight the potential of building resource-aware hierarchical monitoring systems, where probes serve as an efficient initial filter and flag cases for more expensive downstream analysis. We release our novel synthetic dataset and codebase to encourage further study.

2025-06-12

ArXiv (prépublication)

doi.org

arxiv.org

Detecting High-Stakes Interactions with Activation Probes

Alex McKenzie

Urja Pawar

Phil Blandfort

William Bankes

David Scott Krueger

Ekdeep Singh Lubana

Dmitrii Krasheninnikov

Monitoring is an important aspect of safely deploying Large Language Models (LLMs). This paper examines activation probes for detecting"high… (voir plus)-stakes"interactions -- where the text indicates that the interaction might lead to significant harm -- as a critical, yet underexplored, target for such monitoring. We evaluate several probe architectures trained on synthetic data, and find them to exhibit robust generalization to diverse, out-of-distribution, real-world data. Probes' performance is comparable to that of prompted or finetuned medium-sized LLM monitors, while offering computational savings of six orders-of-magnitude. Our experiments also highlight the potential of building resource-aware hierarchical monitoring systems, where probes serve as an efficient initial filter and flag cases for more expensive downstream analysis. We release our novel synthetic dataset and codebase to encourage further study.

2025-06-12

ArXiv (prépublication)

arxiv.org

Discrete Audio Tokens: More Than a Survey!

Pooneh Mousavi

Gallil Maimon

Adel Moumen

Darius Petermann

Jiatong Shi

Haibin Wu

Haici Yang

Anastasia Kuznetsova

Artem Ploujnikov

Ricard Marxer

Bhuvana Ramabhadran

Benjamin Elizalde

Loren Lugosch

Jinyu Li

Cem Subakan

Phil Woodland

Minje Kim

Hung-yi Lee

Shinji Watanabe

Yossi Adi … (voir 1 de plus)

Mirco Ravanelli

Discrete audio tokens are compact representations that aim to preserve perceptual quality, phonetic content, and speaker characteristics whi… (voir plus)le enabling efficient storage and inference, as well as competitive performance across diverse downstream tasks.They provide a practical alternative to continuous features, enabling the integration of speech and audio into modern large language models (LLMs). As interest in token-based audio processing grows, various tokenization methods have emerged, and several surveys have reviewed the latest progress in the field. However, existing studies often focus on specific domains or tasks and lack a unified comparison across various benchmarks. This paper presents a systematic review and benchmark of discrete audio tokenizers, covering three domains: speech, music, and general audio. We propose a taxonomy of tokenization approaches based on encoder-decoder, quantization techniques, training paradigm, streamability, and application domains. We evaluate tokenizers on multiple benchmarks for reconstruction, downstream performance, and acoustic language modeling, and analyze trade-offs through controlled ablation studies. Our findings highlight key limitations, practical considerations, and open challenges, providing insight and guidance for future research in this rapidly evolving area. For more information, including our main results and tokenizer database, please refer to our website: https://poonehmousavi.github.io/dates-website/.

2025-06-12

ArXiv (prépublication)

arxiv.org

Discrete Audio Tokens: More Than a Survey!

Pooneh Mousavi

Gallil Maimon

Adel Moumen

Darius Petermann

Jiatong Shi

Haibin Wu

Haici Yang

Anastasia Kuznetsova

Artem Ploujnikov

Ricard Marxer

Bhuvana Ramabhadran

Benjamin Elizalde

Loren Lugosch

Jinyu Li

Cem Subakan

Phil Woodland

Minje Kim

Hung-yi Lee

Shinji Watanabe

Yossi Adi … (voir 1 de plus)

Mirco Ravanelli

Discrete audio tokens are compact representations that aim to preserve perceptual quality, phonetic content, and speaker characteristics whi… (voir plus)le enabling efficient storage and inference, as well as competitive performance across diverse downstream tasks. They provide a practical alternative to continuous features, enabling the integration of speech and audio into modern large language models (LLMs). As interest in token-based audio processing grows, various tokenization methods have emerged, and several surveys have reviewed the latest progress in the field. However, existing studies often focus on specific domains or tasks and lack a unified comparison across various benchmarks. This paper presents a systematic review and benchmark of discrete audio tokenizers, covering three domains: speech, music, and general audio. We propose a taxonomy of tokenization approaches based on encoder-decoder, quantization techniques, training paradigm, streamability, and application domains. We evaluate tokenizers on multiple benchmarks for reconstruction, downstream performance, and acoustic language modeling, and analyze trade-offs through controlled ablation studies. Our findings highlight key limitations, practical considerations, and open challenges, providing insight and guidance for future research in this rapidly evolving area. For more information, including our main results and tokenizer database, please refer to our website: https://poonehmousavi.github.io/dates-website/.

2025-06-12

ArXiv (prépublication)

doi.org

arxiv.org

Exploration by Exploitation: Curriculum Learning for Reinforcement Learning Agents through Competence-Based Curriculum Policy Search

Tabitha Edith Lee

Nan Rosemary Ke

Sarvesh Patil

Annya Dahmani

Eunice Yiu

Esra'a Saleh

Alison Gopnik

Oliver Kroemer

Glen Berseth

2025-06-12

ICML.cc/2025/Workshop/EXAIT (poster)

openreview.net

Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving

Luke Rowe

Rodrigue de Schaetzen

Roger Girgis

Chris Pal

Liam Paull

We present Poutine, a 3B-parameter vision-language model (VLM) tailored for end-to-end autonomous driving in long-tail driving scenarios. Po… (voir plus)utine is trained in two stages. To obtain strong base driving capabilities, we train Poutine-Base in a self-supervised vision-language-trajectory (VLT) next-token prediction fashion on 83 hours of CoVLA nominal driving and 11 hours of Waymo long-tail driving. Accompanying language annotations are auto-generated with a 72B-parameter VLM. Poutine is obtained by fine-tuning Poutine-Base with Group Relative Policy Optimization (GRPO) using less than 500 preference-labeled frames from the Waymo validation set. We show that both VLT pretraining and RL fine-tuning are critical to attain strong driving performance in the long-tail. Poutine-Base achieves a rater-feedback score (RFS) of 8.12 on the validation set, nearly matching Waymo's expert ground-truth RFS. The final Poutine model achieves an RFS of 7.99 on the official Waymo test set, placing 1st in the 2025 Waymo Vision-Based End-to-End Driving Challenge by a significant margin. These results highlight the promise of scalable VLT pre-training and lightweight RL fine-tuning to enable robust and generalizable autonomy.

2025-06-12

ArXiv (prépublication)

doi.org

arxiv.org

PyLO: Towards Accessible Learned Optimizers in PyTorch

Paul Janson

Benjamin Thérien

Quentin Anthony

Xiaolong Huang

Abhinav Moudgil

Eugene Belilovsky

Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optim… (voir plus)izers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances -- such as VeLO, which was meta-trained for 4000 TPU-months -- remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for applying the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the broader machine learning community through familiar, widely adopted workflows. Unlike prior work focused on synthetic or convex tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our release includes a CUDA-accelerated version of the small_fc_lopt learned optimizer architecture from (Metz et al., 2022a), delivering substantial speedups -- from 39.36 to 205.59 samples/sec throughput for training ViT B/16 with batch size 32. PyLO also allows us to easily combine learned optimizers with existing optimization tools such as learning rate schedules and weight decay. When doing so, we find that learned optimizers can substantially benefit. Our code is available at https://github.com/Belilovsky-Lab/pylo

2025-06-12

ArXiv (prépublication)

arxiv.org

PyLO: Towards Accessible Learned Optimizers in PyTorch

Paul Janson

Benjamin Thérien

Quentin Gregory Anthony

Xiaolong Huang

Abhinav Moudgil

Eugene Belilovsky

Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optim… (voir plus)izers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances -- such as VeLO, which was meta-trained for 4000 TPU-months -- remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for applying the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the broader machine learning community through familiar, widely adopted workflows. Unlike prior work focused on synthetic or convex tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our release includes a CUDA-accelerated version of the small_fc_lopt learned optimizer architecture from (Metz et al., 2022a), delivering substantial speedups -- from 39.36 to 205.59 samples/sec throughput for training ViT B/16 with batch size 32. PyLO also allows us to easily combine learned optimizers with existing optimization tools such as learning rate schedules and weight decay. When doing so, we find that learned optimizers can substantially benefit. Our code is available at https://github.com/Belilovsky-Lab/pylo

2025-06-12

ArXiv (prépublication)

doi.org

arxiv.org

On Selecting Robust Approaches for Learning Predictive Biomarkers in Metabolomics Data Sets.

Thibaud Godon

Pier-Luc Plante

Jacques Corbeil

Pascal Germain

Alexandre Drouin

Metabolomics, the study of small molecules within biological systems, offers insights into metabolic processes and, consequently, holds grea… (voir plus)t promise for advancing health outcomes. Biomarker discovery in metabolomics represents a significant challenge, notably due to the high dimensionality of the data. Recent work has addressed this problem by analyzing the most important variables in machine learning models. Unfortunately, this approach relies on prior hypotheses about the structure of the data and may overlook simple patterns. To assess the true usefulness of machine learning methods, we evaluate them on a collection of 835 metabolomics data sets. This effort provides valuable insights for metabolomics researchers regarding where and when to use machine learning. It also establishes a benchmark for the evaluation of future methods. Nonetheless, the results emphasize the high diversity of data sets in metabolomics and the complexity of finding biologically relevant biomarkers. As a result, we propose a novel approach applicable across all data sets, offering guidance for future analyses. This method involves directly comparing univariate and multivariate models. We demonstrate through selected examples how this approach can guide data analysis across diverse data set structures, representative of the observed variability. Code and data are available for research purposes.

2025-06-12

Analytical Chemistry (publié)

doi.org

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications