Publications

Test Time Adaptation Using Adaptive Quantile Recalibration

2025-06-10

ICML.cc/2025/Workshop/PUT (poster)

openreview.net

What Matters when Modeling Human Behavior using Imitation Learning?

As AI systems become increasingly embedded in human decision-making process, aligning their behavior with human values is critical to ensuri… (see more)ng safe and trustworthy deployment. A central approach to AI Alignment called Imitation Learning (IL), trains a learner to directly mimic desirable human behaviors from expert demonstrations. However, standard IL methods assume that (1) experts act to optimize expected returns; (2) expert policies are Markovian. Both assumptions are inconsistent with empirical findings from behavioral economics, according to which humans are (1) risk-sensitive; and (2) make decisions based on past experience. In this work, we examine the implications of risk sensitivity for IL and show that standard approaches do not capture all optimal policies under risk-sensitive decision criteria. By characterizing these expert policies, we identify key limitations of existing IL algorithms in replicating expert performance in risk-sensitive settings. Our findings underscore the need for new IL frameworks that account for both risk-aware preferences and temporal dependencies to faithfully align AI behavior with human experts.

2025-06-10

ICML.cc/2025/Workshop/MoFA (poster)

openreview.net

Adversarial Attack Classification and Robustness Testing for Large Language Models for Code

Yang Liu

Armstrong Foundjem

Foutse Khomh

Heng Li

Large Language Models (LLMs) have become vital tools in software development tasks such as code generation, completion, and analysis. As the… (see more)ir integration into workflows deepens, ensuring robustness against vulnerabilities especially those triggered by diverse or adversarial inputs becomes increasingly important. Such vulnerabilities may lead to incorrect or insecure code generation when models encounter perturbed task descriptions, code, or comments. Prior research often overlooks the role of natural language in guiding code tasks. This study investigates how adversarial perturbations in natural language inputs including prompts, comments, and descriptions affect LLMs for Code (LLM4Code). It examines the effects of perturbations at the character, word, and sentence levels to identify the most impactful vulnerabilities. We analyzed multiple projects (e.g., ReCode, OpenAttack) and datasets (e.g., HumanEval, MBPP), establishing a taxonomy of adversarial attacks. The first dimension classifies the input type code, prompts, or comments while the second dimension focuses on granularity: character, word, or sentence-level changes. We adopted a mixed-methods approach, combining quantitative performance metrics with qualitative vulnerability analysis. LLM4Code models show varying robustness across perturbation types. Sentence-level attacks were least effective, suggesting models are resilient to broader contextual changes. In contrast, word-level perturbations posed serious challenges, exposing semantic vulnerabilities. Character-level effects varied, showing model sensitivity to subtle syntactic deviations.Our study offers a structured framework for testing LLM4Code robustness and emphasizes the critical role of natural language in adversarial evaluation. Improving model resilience to semantic-level disruptions is essential for secure and reliable code-generation systems.

2025-06-09

ArXiv (preprint)

arxiv.org

AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models

Jacob Chmura

Shahrad Mohammadzadeh

Taz Scott-Talib

Nishanth Anand

2025-06-09

ICML.cc/2025/Workshop/CODEML (published)

openreview.net

Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Suyuchen Wang

Jinlin Wang

Xinyu Wang

Shiqi Li

Xiangru Tang

Sirui Hong

Xiao-Wen Chang

Chenglin Wu

Bang Liu

Large language models (LLMs) often struggle with context fidelity, producing inconsistent answers when responding to questions based on prov… (see more)ided information. Existing approaches either rely on expensive supervised fine-tuning to generate evidence post-answer or train models to perform web searches without necessarily improving utilization of the given context. We propose CARE, a novel native retrieval-augmented reasoning framework that teaches LLMs to explicitly integrate in-context evidence within their reasoning process with the model's own retrieval capabilities. Our method requires minimal labeled evidence data while significantly enhancing both retrieval accuracy and answer generation performance through strategically retrieved in-context tokens in the reasoning chain. Extensive experiments on multiple real-world and counterfactual QA benchmarks demonstrate that our approach substantially outperforms supervised fine-tuning, traditional retrieval-augmented generation methods, and external retrieval solutions. This work represents a fundamental advancement in making LLMs more accurate, reliable, and efficient for knowledge-intensive tasks.

2025-06-09

ICML.cc/2025/Workshop/LCFM (accepted)

openreview.net

A Meta-Learning Approach to Causal Inference

Dragos Cristian Manta

Philippe Brouillard

Dhanya Sridhar

Predicting the effect of unseen interventions is at the heart of many scientific endeavours. While causal discovery is often used to answer … (see more)these causal questions, it involves learning a full causal model, not tailored to the specific goal of predicting unseen interventions, and operates under stringent assumptions. We introduce a novel method based on meta-learning that predicts interventional effects without explicitly assuming a causal model. Our preliminary results on synthetic data show that it can provide good generalization to unseen interventions, and it even compares favorably to a causal discovery method. Our model-agnostic method opens up many avenues for future exploration, particularly for settings where causal discovery cannot be applied.

2025-06-09

ICML.cc/2025/Workshop/SIM (poster)

openreview.net

Meta-World+: An Improved, Standardized, RL Benchmark

Reginald McLean

Evangelos Chatzaroulas

Luc McCutcheon

Frank Röder

Tianhe Yu

Zhanpeng He

K.R. Zentner

Ryan Julian

J K Terry

Isaac Woungang

Nariman Farsad

Pablo Samuel Castro

Multi-task reinforcement learning challenges agents to master diverse skills simultaneously, and Meta-World emerged as the gold standard ben… (see more)chmark for evaluating these algorithms. However, since the introduction of the Meta-World benchmark there have been numerous undocumented changes which inhibit fair comparison of multi-task and meta reinforcement learning algorithms. This work strives to disambiguate these results from the literature, while also producing an open-source version of Meta-World that has full reproducibility of past results.

2025-06-09

ICML.cc/2025/Workshop/CODEML (spotlight)

openreview.net

PyLO: Towards Accessible Learned Optimizers in PyTorch

Paul Janson

Benjamin Therien

Quentin Gregory Anthony

Xiaolong Huang

Abhinav Moudgil

Eugene Belilovsky

Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optim… (see more)izers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances -- such as VeLO, which was meta-trained for 4000 TPU-months -- remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for applying the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the broader machine learning community through familiar, widely adopted workflows. Unlike prior work focused on synthetic or convex tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our release includes a CUDA-accelerated version of the small_fc_lopt learned optimizer architecture from (Metz et al., 2022a), delivering substantial speedups -- from 39.36 to 205.59 samples/sec throughput for training ViT B/16 with batch size 32. PyLO also allows us to easily combine learned optimizers with existing optimization tools such as learning rate schedules and weight decay. When doing so, we find that learned optimizers can substantially benefit. Our code is available at https://github.com/Belilovsky-Lab/pylo

2025-06-09

ICML.cc/2025/Workshop/CODEML (published)

doi.org

openreview.net

Quantized Disentanglement: A Practical Approach

Vitória Barin-Pacela

Kartik Ahuja

Simon Lacoste-Julien

Pascal Vincent

2025-06-09

ICML.cc/2025/Workshop/SIM (poster)

openreview.net

Revisiting the Goldilocks Zone in Inhomogeneous Networks

Zacharie Garnier Cuchet

Sarath Chandar

Ekaterina Lobacheva

We investigate how architectural inhomogeneities—such as biases, layer normalization, and residual connections—affect the curvature of t… (see more)he loss landscape at initialization and its link to trainability. We focus on the Goldilocks zone, a region in parameter space with excess positive curvature, previously associated with improved optimization in homogeneous networks. To extend this analysis, we compare two scaling strategies: weight scaling and softmax temperature scaling. Our results show that in networks with biases or residual connections, both strategies identify a Goldilocks zone aligned with better training. In contrast, layer normalization leads to lower or negative curvature, yet stable optimization—revealing a disconnect between curvature and trainability. Softmax temperature scaling behaves more consistently across models, making it a more robust probe. Overall, the Goldilocks zone remains relevant in inhomogeneous networks, but its geometry and predictive power depend on architectural choices, particularly normalization.

2025-06-09

ICML.cc/2025/Workshop/HiLD (poster)

openreview.net

Spaced Scheduling for Large Language Model Training

Amine El hattami

Nicolas Chapados

Chris Pal

2025-06-09

TMLR (accepted)

openreview.net

TGM: A Modular Framework for Machine Learning on Temporal Graphs

Michael M. Bronstein

Matthias Fey

While deep learning on static graphs has been revolutionized by standardized libraries like PyTorch Geometric and DGL, machine learning on T… (see more)emporal Graphs (TG), networks that evolve over time, lacks comparable software infrastructure. Existing TG libraries are limited in scope, focusing on a single method category or specific algorithms. We introduce Temporal Graph Modelling (TGM), a comprehensive framework for machine learning on temporal graphs to address this gap. Through a modular architecture, TGM is the first library to support both discrete and continuous-time TG methods and implements a wide range of TG methods. The TGM framework combines an intuitive front-end API with an optimized backend storage, enabling reproducible research and efficient experimentation at scale. Key features include graph-level optimizations for offline training and built-in performance profiling capabilities. Through extensive benchmarking on five real-world networks, TGM is up to 6 times faster than the widely used DyGLib library on TGN and TGAT models and up to 8 times faster than the UTG framework for converting edges into coarse-grained snapshots.

2025-06-09

ICML.cc/2025/Workshop/CODEML (published)

openreview.net

Speed Science

Leading in a New Era

Supervision Requests

Publications

Speed Science

Leading in a New Era

Supervision Requests

Popular keywords:

Publications