Publications

Building spatial world models from sparse transitional episodic memories

Zizhan He

Maxime Daigle

Pouya Bashivan

Many animals possess a remarkable capacity to rapidly construct flexible mental models of their environments. These world models are crucial… (see more) for ethologically relevant behaviors such as navigation, exploration, and planning. The ability to form episodic memories and make inferences based on these sparse experiences is believed to underpin the efficiency and adaptability of these models in the brain. Here, we ask: Can a neural network learn to construct a spatial model of its surroundings from sparse and disjoint episodic memories? We formulate the problem in a simulated world and propose a novel framework, the Episodic Spatial World Model (ESWM), as a potential answer. We show that ESWM is highly sample-efficient, requiring minimal observations to construct a robust representation of the environment. It is also inherently adaptive, allowing for rapid updates when the environment changes. In addition, we demonstrate that ESWM readily enables near-optimal strategies for exploring novel environments and navigating between arbitrary points, all without the need for additional training.

2025-05-01

arXiv (published)

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down

Yingzhi Wang

Anas Alhmoud

Saad Alsahly

Muhammad Alqurishi

Mirco Ravanelli

2025-05-01

arXiv (published)

Caption This, Reason That: VLMs Caught in the Middle

Zihan Weng

Lucas Gomez

Taylor Whittington Webb

Pouya Bashivan

2025-05-01

arXiv (published)

Compositional Risk Minimization

Divyat Mahajan

Mohammad Pezeshki

Charles Arnal

Ioannis Mitliagkas

Kartik Ahuja

Pascal Vincent

2025-05-01

ICML.cc/2025/Conference (poster)

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Robert Williams

Arjun Ashok

Étienne Marcotte

Valentina Zantedeschi

Jithendaraa Subramanian

Roland Riachi

James Requeima

Alexandre Lacoste

Irina Rish

Nicolas Chapados

Alexandre Drouin

2025-05-01

ICML.cc/2025/Conference (poster)

Dimension-adapted Momentum Outscales SGD

Damien Ferbach

Katie Everett

Gauthier Gidel

Elliot Paquette

Courtney Paquette

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by dat… (see more)a complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.

2025-05-01

arXiv (published)

Discovering Symbolic Cognitive Models from Human and Animal Behavior

Pablo Samuel Castro

Nenad Tomasev

Ankit Anand

Navodita Sharma

Rishika Mohanta

Aparna Dev

Kuba Perlin

Siddhant Jain

Kyle Levin

Noemi Elteto

Will Dabney

Alexander Novikov

Glenn C Turner

Maria K Eckstein

Nathaniel D. Daw

Kevin J Miller

Kim Stachenfeld

Symbolic models play a key role in cognitive science, expressing computationally precise hypotheses about how the brain implements a cogniti… (see more)ve process. Identifying an appropriate model typically requires a great deal of effort and ingenuity on the part of a human scientist. Here, we adapt FunSearch (Romera-Paredes et al. 2024), a recently developed tool that uses Large Language Models (LLMs) in an evolutionary algorithm, to automatically discover symbolic cognitive models that accurately capture human and animal behavior. We consider datasets from three species performing a classic reward-learning task that has been the focus of substantial modeling effort, and find that the discovered programs outperform state-of-the-art cognitive models for each. The discovered programs can readily be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learning and decision-making algorithms. Broadly, these results demonstrate the viability of using LLM-powered program synthesis to propose novel scientific hypotheses regarding mechanisms of human and animal cognition.

2025-05-01

ICML.cc/2025/Conference (poster)

Does learning the right latent variables necessarily improve in-context learning?

Sarthak Mittal

Eric Elmoznino

Leo Gagnon

Sangnie Bhardwaj

Dhanya Sridhar

Guillaume Lajoie

Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting ave… (see more)nues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.

2025-05-01

ICML.cc/2025/Conference (poster)