Publications

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down

Yingzhi Wang

Anas Alhmoud

Saad Alsahly

Muhammad Alqurishi

Mirco Ravanelli

2025-05-01

arXiv (publié)

Caption This, Reason That: VLMs Caught in the Middle

Zihan Weng

Lucas Gomez

Taylor Whittington Webb

Pouya Bashivan

2025-05-01

arXiv (publié)

Compositional Risk Minimization

Charles Arnal

Kartik Ahuja

2025-05-01

ICML.cc/2025/Conference (poster)

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Robert Williams

Arjun Ashok

Étienne Marcotte

Valentina Zantedeschi

Jithendaraa Subramanian

Alexandre Lacoste

2025-05-01

ICML.cc/2025/Conference (poster)

Dimension-adapted Momentum Outscales SGD

Damien Ferbach

Katie Everett

Gauthier Gidel

Elliot Paquette

Courtney Paquette

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by dat… (voir plus)a complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.

2025-05-01

arXiv (publié)

Discovering Symbolic Cognitive Models from Human and Animal Behavior

Pablo Samuel Castro

Nenad Tomasev

Ankit Anand

Navodita Sharma

Rishika Mohanta

Aparna Dev

Kuba Perlin

Siddhant Jain

Kyle Levin

Noemi Elteto

Will Dabney

Alexander Novikov

Glenn C Turner

Maria K Eckstein

Nathaniel D. Daw

Kevin J Miller

Kim Stachenfeld

Symbolic models play a key role in cognitive science, expressing computationally precise hypotheses about how the brain implements a cogniti… (voir plus)ve process. Identifying an appropriate model typically requires a great deal of effort and ingenuity on the part of a human scientist. Here, we adapt FunSearch (Romera-Paredes et al. 2024), a recently developed tool that uses Large Language Models (LLMs) in an evolutionary algorithm, to automatically discover symbolic cognitive models that accurately capture human and animal behavior. We consider datasets from three species performing a classic reward-learning task that has been the focus of substantial modeling effort, and find that the discovered programs outperform state-of-the-art cognitive models for each. The discovered programs can readily be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learning and decision-making algorithms. Broadly, these results demonstrate the viability of using LLM-powered program synthesis to propose novel scientific hypotheses regarding mechanisms of human and animal cognition.

2025-05-01

ICML.cc/2025/Conference (poster)

Does learning the right latent variables necessarily improve in-context learning?

Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting ave… (voir plus)nues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.

2025-05-01

ICML.cc/2025/Conference (poster)

FLAM: Frame-Wise Language-Audio Modeling

Yusong Wu

Christos Tsirigotis

Ke Chen

Anna (Cheng-Zhi) Huang

Aaron Courville

Oriol Nieto

Prem Seetharaman

Justin Salamon

2025-05-01

ICML.cc/2025/Conference (poster)

A flexible machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition

Marc-André Legault

Jason Hartford

Benoît J. Arsenault

Archer Yang

Joelle Pineau

2025-05-01

American Journal of Human Genetics (publié)

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

Tianyu Zhang

Andrew Robert Williams

Phillip Wozny

Soham Rajesh Phade

Kai-Hendrik Cohrs

Sunil Srinivasa

Koen Ponse

Yang Zhang

Prateek Gupta

Marco Jiralerspong

Yoshua Bengio

Stephan Zheng

Li Li

Erman Acar

Irina Rish

2025-05-01

ICML.cc/2025/Conference (poster)

From Language Models over Tokens to Language Models over Characters

Tim Vieira

Benjamin LeBrun

Mario Giulianelli

Juan Luis Gastaldi

Brian DuSell

John Terilla

Timothy O'Donnell

Ryan Cotterell

Modern language models are internally—and mathematically—distributions over *token* strings rather than *character* strings, posing nume… (voir plus)rous challenges for programmers building user applications on top of them. For example, if a prompt is specified as a character string, it must be tokenized before passing it to the token-level language model. Thus, the tokenizer and consequent processing are very sensitive to the specification of the prompt (e.g., whether the prompt ends with a space or not). This paper presents algorithms for converting token-level language models to character-level ones. We present both exact and approximate algorithms. In the empirical portion of the paper, we benchmark the practical runtime and approximation quality. Across four publicly available language models, we find that—even with a small computation budget—our method is able to accurately approximate the character-level distribution at reasonably fast speeds, and that a significant improvement in the language model's compression rate (bits/byte) is achieved.

2025-05-01

ICML.cc/2025/Conference (poster)