Publications

Generalized Random Forests using Fixed-Point Trees

David A. Stephens

Archer Y. Yang

We propose a computationally efficient alternative to generalized random forests (GRFs) for estimating heterogeneous effects in large dimens… (see more)ions. While GRFs rely on a gradient-based splitting criterion, which in large dimensions is computationally expensive and unstable, our method introduces a fixed-point approximation that eliminates the need for Jacobian estimation. This gradient-free approach preserves GRF's theoretical guarantees of consistency and asymptotic normality while significantly improving computational efficiency. We demonstrate that our method achieves a speedup of multiple times over standard GRFs without compromising statistical accuracy. Experiments on both simulated and real-world data validate our approach. Our findings suggest that the proposed method is a scalable alternative for localized effect estimation in machine learning and causal inference applications

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

GRAIL: Graph Edit Distance and Node Alignment using LLM-Generated Code

Samidha Verma

Arushi Goyal

Ananya Mathur

Ankit Anand

Sayan Ranu

Graph Edit Distance (GED) is a widely used metric for measuring similarity between two graphs. Computing the optimal GED is NP-hard, leading… (see more) to the development of various neural and non-neural heuristics. While neural methods have achieved improved approximation quality compared to non-neural approaches, they face significant challenges: (1) They require large amounts of ground truth data, which is itself NP-hard to compute. (2) They operate as black boxes, offering limited interpretability. (3) They lack cross-domain generalization, necessitating expensive retraining for each new dataset. We address these limitations with GRAIL, introducing a paradigm shift in this domain. Instead of training a neural model to predict GED, GRAIL employs a novel combination of large language models (LLMs) and automated prompt tuning to generate a program that is used to compute GED. This shift from predicting GED to generating programs imparts various advantages, including end-to-end interpretability and an autonomous self-evolutionary learning mechanism without ground-truth supervision. Extensive experiments on seven datasets confirm that GRAIL not only surpasses state-of-the-art GED approximation methods in prediction quality but also achieves robust cross-domain generalization across diverse graph distributions.

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

Grokking Beyond the Euclidean Norm of Model Parameters

Pascal Jr Tikeng Notsawo

Guillaume Dumas

Guillaume Rabusseau

Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. In… (see more) this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Improved Off-policy Reinforcement Learning in Biological Sequence Design

Alex Hernández-García

Jinkyoo Park

Designing biological sequences with desired properties is challenging due to vast search spaces and limited evaluation budgets. Although rei… (see more)nforcement learning methods use proxy models for rapid reward evaluation, insufficient training data can cause proxy misspecification on out-of-distribution inputs. To address this, we propose a novel off-policy search,

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

Kevin Kasa

Graham W. Taylor

Krishnamurthy Dj Dvijotham

Alexandre Lacoste

AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cau… (see more)se unintended or harmful behavior. Inspired by the well-established concept of firewalls, we show that a simple, modular and model-agnostic defense operating at the agent--tool interface achieves perfect security (0% or the lowest possible attack success rate) with high utility (task success rate) across four public benchmarks: AgentDojo, Agent Security Bench, InjecAgent and tau-Bench, while achieving a state-of-the-art security-utility tradeoff compared to prior results. Specifically, we employ a defense based on two firewalls: a Tool-Input Firewall (Minimizer) and a Tool-Output Firewall (Sanitizer). Unlike prior complex approaches, this firewall defense makes minimal assumptions on the agent and can be deployed out-of-the-box, while maintaining strong performance without compromising utility. However, our analysis also reveals critical limitations in these existing benchmarks, including flawed success metrics, implementation bugs, and most importantly, weak attacks, hindering significant progress in the field. To foster more meaningful progress, we present targeted fixes to these issues for AgentDojo and Agent Security Bench while proposing best-practices for more robust benchmark design. Further, we demonstrate that although these firewalls push the state-of-the-art on existing benchmarks, it is still possible to bypass them in practice, underscoring the need to incorporate stronger attacks in security benchmarks. Overall, our work shows that existing agentic security benchmarks are easily saturated by a simple approach and highlights the need for stronger agentic security benchmarks with carefully chosen evaluation metrics and strong adaptive attacks.

2025-10-05

ArXiv (preprint)

doi.org

arxiv.org

Language Models over Canonical Byte-Pair Encodings

Tim Vieira

Tianyu Liu

Clemente Pasti

Yahya Emara

Brian DuSell

Benjamin LeBrun

Mario Giulianelli

Juan Luis Gastaldi

Timothy J. O'Donnell

Ryan Cotterell

Modern language models represent probability distributions over character strings as distributions over (shorter) token strings derived via … (see more)a deterministic tokenizer, such as byte-pair encoding. While this approach is highly effective at scaling up language models to large corpora, its current incarnations have a concerning property: the model assigns nonzero probability mass to an exponential number of

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

Thomas Schmied

Thomas Adler

Vihang P. Patil

Maximilian Beck

Korbinian Poppel

Johannes Brandstetter

Günter Klambauer

Razvan Pascanu

Sepp Hochreiter

In recent years, there has been a trend in the field of Reinforcement Learning (RL) towards large action models trained offline on large-sca… (see more)le datasets via sequence modeling. Existing models are primarily based on the Transformer architecture, which result in powerful agents. However, due to slow inference times, Transformer-based approaches are impractical for real-time applications, such as robotics. Recently, modern recurrent architectures, such as xLSTM and Mamba, have been proposed that exhibit parallelization benefits during training similar to the Transformer architecture while offering fast inference. In this work, we study the aptitude of these modern recurrent architectures for large action models. Consequently, we propose a Large Recurrent Action Model (LRAM) with an xLSTM at its core that comes with linear-time inference complexity and natural sequence length extrapolation abilities. Experiments on 432 tasks from 6 domains show that LRAM compares favorably to Transformers in terms of performance and speed.

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces

Rashid Mushkani

Shravan Nayak

Hugo Berard

Allison Cohen

Shin Koseki

Hadrien Bertrand

We introduce the Local Intersectional Visual Spaces (LIVS) dataset, a benchmark for multi-criteria alignment, developed through a two-year p… (see more)articipatory process with 30 community organizations to support the pluralistic alignment of text-to-image (T2I) models in inclusive urban planning. The dataset encodes 37,710 pairwise comparisons across 13,462 images, structured along six criteria - Accessibility, Safety, Comfort, Invitingness, Inclusivity, and Diversity - derived from 634 community-defined concepts. Using Direct Preference Optimization (DPO), we fine-tune Stable Diffusion XL to reflect multi-criteria spatial preferences and evaluate the LIVS dataset and the fine-tuned model through four case studies: (1) DPO increases alignment with annotated preferences, particularly when annotation volume is high; (2) preference patterns vary across participant identities, underscoring the need for intersectional data; (3) human-authored prompts generate more distinctive visual outputs than LLM-generated ones, influencing annotation decisiveness; and (4) intersectional groups assign systematically different ratings across criteria, revealing the limitations of single-objective alignment. While DPO improves alignment under specific conditions, the prevalence of neutral ratings indicates that community values are heterogeneous and often ambiguous. LIVS provides a benchmark for developing T2I models that incorporate local, stakeholder-driven preferences, offering a foundation for context-aware alignment in spatial design.

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon

Hyeonseo Cho

Doojin Baek

Yoshua Bengio

Sungjin Ahn

Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance nat… (see more)urally improves with inference-time computation scaling-standard diffusion-based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as inference-time computation increases.

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Multivariate Conformal Selection

Tian Bai

Yue Zhao

Xiang Yu

Archer Yang

Selecting high-quality candidates from large datasets is critical in applications such as drug discovery, precision medicine, and alignment … (see more)of large language models (LLMs). While Conformal Selection (CS) provides rigorous uncertainty quantification, it is limited to univariate responses and scalar criteria. To address this, we propose Multivariate Conformal Selection (mCS), a generalization of CS designed for multivariate response settings. Our method introduces regional monotonicity and employs multivariate nonconformity scores to construct conformal

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling

Avery Ma

Yangchen Pan

Amir-massoud Farahmand

Many-shot jailbreaking circumvents the safety alignment of large language models by exploiting their ability to process long input sequences… (see more). To achieve this, the malicious target prompt is prefixed with hundreds of fabricated conversational turns between the user and the model. These fabricated exchanges are randomly sampled from a pool of malicious questions and responses, making it appear as though the model has already complied with harmful instructions. In this paper, we present PANDAS: a hybrid technique that improves many-shot jailbreaking by modifying these fabricated dialogues with positive affirmations, negative demonstrations, and an optimized adaptive sampling method tailored to the target prompt's topic. Extensive experiments on AdvBench and HarmBench, using state-of-the-art LLMs, demonstrate that PANDAS significantly outperforms baseline methods in long-context scenarios. Through an attention analysis, we provide insights on how long-context vulnerabilities are exploited and show how PANDAS further improves upon many-shot jailbreaking.

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

Tingchen Fu

Mrinank Sharma

Philip Torr

Shay B. Cohen

David M. Krueger

Fazl Barez

Preference learning is a central component for aligning current LLMs, but this process can be vulnerable to data poisoning attacks. To addre… (see more)ss this concern, we introduce PoisonBench, a benchmark for evaluating large language models' susceptibility to data poisoning during preference learning. Data poisoning attacks can manipulate large language model responses to include hidden malicious content or biases, potentially causing the model to generate harmful or unintended outputs while appearing to function normally. We deploy two distinct attack types across eight realistic scenarios, assessing 22 widely-used models. Our findings reveal concerning trends: (1) Scaling up parameter size does not always enhance resilience against poisoning attacks and the influence on model resilience varies among different model suites. (2) There exists a log-linear relationship between the effects of the attack and the data poison ratio; (3) The effect of data poisoning can generalize to extrapolated triggers that are not included in the poisoned data. These results expose weaknesses in current preference learning techniques, highlighting the urgent need for more robust defenses against malicious models and data manipulation.

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications