Publications

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Reyhane Askari Hemmat

Mohammad Pezeshki

Elvis Dohmatob

Florian Bordes

Pietro Astolfi

Melissa Hall

Jakob Verbeek

Michal Drozdzal

Adriana Romero Soriano

2025-05-01

ICML.cc/2025/Conference (présentation orale)

In-context learning and Occam's razor

Eric Elmoznino

Tom Marty

Tejas Kasetty

Leo Gagnon

Sarthak Mittal

Mahan Fathi

Dhanya Sridhar

Guillaume Lajoie

A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees fo… (voir plus)r generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

2025-05-01

ICML.cc/2025/Conference (poster)

Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?

Anthony GX-Chen

Dongyan Lin

Mandana Samiei

Doina Precup

Blake Richards

Rob Fergus

Kenneth Marino

Language model (LM) agents are increasingly used as autonomous decision-makers who need to actively gather information to guide their decisi… (voir plus)ons. A crucial cognitive skill for such agents is the efficient exploration and understanding of the causal structure of the world -- key to robust, scientifically grounded reasoning. Yet, it remains unclear whether LMs possess this capability or exhibit systematic biases leading to erroneous conclusions. In this work, we examine LMs' ability to explore and infer causal relationships, using the well-established"Blicket Test"paradigm from developmental psychology. We find that LMs reliably infer the common, intuitive disjunctive causal relationships but systematically struggle with the unusual, yet equally (or sometimes even more) evidenced conjunctive ones. This"disjunctive bias"persists across model families, sizes, and prompting strategies, and performance further declines as task complexity increases. Interestingly, an analogous bias appears in human adults, suggesting that LMs may have inherited deep-seated reasoning heuristics from their training data. To this end, we quantify similarities between LMs and humans, finding that LMs exhibit adult-like inference profiles (but not children-like). Finally, we propose a test-time sampling method which explicitly samples and eliminates hypotheses about causal relationships from the LM. This scalable approach significantly reduces the disjunctive bias and moves LMs closer to the goal of scientific, causally rigorous reasoning.

2025-05-01

arXiv (publié)

Language Models over Canonical Byte-Pair Encodings

Tim Vieira

Tianyu Liu

Clemente Pasti

Yahya Emara

Brian DuSell

Benjamin LeBrun

Mario Giulianelli

Juan Luis Gastaldi

Timothy O'Donnell

Ryan Cotterell

2025-05-01

ICML.cc/2025/Conference (poster)

Learning Penalty for Optimal Partitioning via Automatic Feature Extraction

Tung L. Nguyen

Toby Dylan Hocking

Changepoint detection identifies significant shifts in data sequences, making it important in areas like finance, genetics, and healthcare. … (voir plus)The Optimal Partitioning algorithms efficiently detect these changes, using a penalty parameter to limit the changepoints number. Determining the appropriate value for this penalty can be challenging. Traditionally, this process involved manually extracting statistical features, such as sequence length or variance to make the prediction. This study proposes a novel approach that uses recurrent neural networks to learn this penalty directly from raw sequences by automatically extracting features. Experiments conducted on 20 benchmark genomic datasets show that this novel method surpasses traditional methods in partitioning accuracy in most cases.

2025-05-01

arXiv (publié)

Gintare Karolina Dziugaite

Leveraging Per-Instance Privacy for Machine Unlearning

Nazanin Mohammadi Sepahvand

Anvith Thudi

Berivan Isik

Ashmita Bhattacharyya

Nicolas Papernot

Eleni Triantafillou

Daniel M. Roy

2025-05-01

ICML.cc/2025/Conference (poster)

LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs

Pooneh Mousavi

Shubham Gupta

Cem Subakan

Mirco Ravanelli

Foundation models based on large language models (LLMs) have shown great success in handling various tasks and modalities. However, adapting… (voir plus) these models for general-purpose audio-language tasks is challenging due to differences in acoustic environments and task variations. In this work, we introduce LiSTEN Learning Soft Token Embeddings for Neural Audio LLMs), a framework for adapting LLMs to speech and audio tasks. LiSTEN uses a dynamic prompt selection strategy with learnable key-value pairs, allowing the model to balance general and task-specific knowledge while avoiding overfitting in a multitask setting. Our approach reduces dependence on large-scale ASR or captioning datasets, achieves competitive performance with fewer trainable parameters, and simplifies training by using a single-stage process. Additionally, LiSTEN enhances interpretability by analyzing the diversity and overlap of selected prompts across different tasks.

2025-05-01

arXiv (publié)