Building spatial world models from sparse transitional episodic memories
Zizhan He
Maxime Daigle
Many animals possess a remarkable capacity to rapidly construct flexible mental models of their environments. These world models are crucial… (see more) for ethologically relevant behaviors such as navigation, exploration, and planning. The ability to form episodic memories and make inferences based on these sparse experiences is believed to underpin the efficiency and adaptability of these models in the brain. Here, we ask: Can a neural network learn to construct a spatial model of its surroundings from sparse and disjoint episodic memories? We formulate the problem in a simulated world and propose a novel framework, the Episodic Spatial World Model (ESWM), as a potential answer. We show that ESWM is highly sample-efficient, requiring minimal observations to construct a robust representation of the environment. It is also inherently adaptive, allowing for rapid updates when the environment changes. In addition, we demonstrate that ESWM readily enables near-optimal strategies for exploring novel environments and navigating between arbitrary points, all without the need for additional training.
Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down
Yingzhi Wang
Anas Alhmoud
Saad Alsahly
Muhammad Alqurishi
Compositional Risk Minimization
Divyat Mahajan
Mohammad Pezeshki
Charles Arnal
Kartik Ahuja
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Andrew Robert Williams
Arjun Ashok
Étienne Marcotte
Valentina Zantedeschi
Jithendaraa Subramanian
Roland Riachi
James Requeima
Alexandre Lacoste
Discovering Symbolic Cognitive Models from Human and Animal Behavior
Nenad Tomasev
Navodita Sharma
Rishika Mohanta
Aparna Dev
Kuba Perlin
Siddhant Jain
Kyle Levin
Noemi Elteto
Will Dabney
Alexander Novikov
Glenn C Turner
Maria K Eckstein
Nathaniel D. Daw
Kevin J Miller
Kim Stachenfeld
Symbolic models play a key role in cognitive science, expressing computationally precise hypotheses about how the brain implements a cogniti… (see more)ve process. Identifying an appropriate model typically requires a great deal of effort and ingenuity on the part of a human scientist. Here, we adapt FunSearch (Romera-Paredes et al. 2024), a recently developed tool that uses Large Language Models (LLMs) in an evolutionary algorithm, to automatically discover symbolic cognitive models that accurately capture human and animal behavior. We consider datasets from three species performing a classic reward-learning task that has been the focus of substantial modeling effort, and find that the discovered programs outperform state-of-the-art cognitive models for each. The discovered programs can readily be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learning and decision-making algorithms. Broadly, these results demonstrate the viability of using LLM-powered program synthesis to propose novel scientific hypotheses regarding mechanisms of human and animal cognition.
Does learning the right latent variables necessarily improve in-context learning?
Sarthak Mittal
Eric Elmoznino
Leo Gagnon
Sangnie Bhardwaj
Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting ave… (see more)nues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.
FLAM: Frame-Wise Language-Audio Modeling
Yusong Wu
Christos Tsirigotis
Ke Chen
Oriol Nieto
Prem Seetharaman
Justin Salamon
A flexible machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition
Jason Hartford
Benoît J. Arsenault
AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N
Tianyu Zhang
Andrew Robert Williams
Phillip Wozny
Kai-Hendrik Cohrs
Koen Ponse
Marco Jiralerspong
Soham Rajesh Phade
Sunil Srinivasa
Lu Liu
Yang Zhang
Prateek Gupta
Erman Acar
Stephan Zheng
From Language Models over Tokens to Language Models over Characters
Tim Vieira
Benjamin LeBrun
Mario Giulianelli
Juan Luis Gastaldi
Brian DuSell
John Terilla
Ryan Cotterell
Modern language models are internally—and mathematically—distributions over *token* strings rather than *character* strings, posing nume… (see more)rous challenges for programmers building user applications on top of them. For example, if a prompt is specified as a character string, it must be tokenized before passing it to the token-level language model. Thus, the tokenizer and consequent processing are very sensitive to the specification of the prompt (e.g., whether the prompt ends with a space or not). This paper presents algorithms for converting token-level language models to character-level ones. We present both exact and approximate algorithms. In the empirical portion of the paper, we benchmark the practical runtime and approximation quality. Across four publicly available language models, we find that—even with a small computation budget—our method is able to accurately approximate the character-level distribution at reasonably fast speeds, and that a significant improvement in the language model's compression rate (bits/byte) is achieved.
Galileo: Learning Global&Local Features of Many Remote Sensing Modalities
Gabriel Tseng
Anthony Fuller
Marlena Reil
Henry Herzog
Patrick Beukema
Favyen Bastani
James R Green
Evan Shelhamer
Hannah Kerner
We introduce a highly multimodal transformer to represent many remote sensing modalities - multispectral optical, synthetic aperture radar, … (see more)elevation, weather, pseudo-labels, and more - across space and time. These inputs are useful for diverse remote sensing tasks, such as crop mapping and flood detection. However, learning shared representations of remote sensing data is challenging, given the diversity of relevant data modalities, and because objects of interest vary massively in scale, from small boats (1-2 pixels and fast) to glaciers (thousands of pixels and slow). We present a novel self-supervised learning algorithm that extracts multi-scale features across a flexible set of input modalities through masked modeling. Our dual global and local contrastive losses differ in their targets (deep representations vs. shallow input projections) and masking strategies (structured vs. not). Our Galileo is a single generalist model that outperforms SoTA specialist models for satellite images and pixel time series across eleven benchmarks and multiple tasks.
Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks
Benjamin Leblanc
Mathieu Bazinet
Nathaniel D'Amours
Both PAC-Bayesian and Sample Compress learning frameworks have been shown instrumental for deriving tight (non-vacuous) generalization bound… (see more)s for neural networks. We leverage these results in a meta-learning scheme, relying on a hypernetwork that outputs the parameters of a downstream predictor from a dataset input. The originality of our approach lies in the investigated hypernetwork architectures that encode the dataset before decoding the parameters: (1) a PAC-Bayesian encoder that expresses a posterior distribution over a latent space, (2) a Sample Compress encoder that selects a small sample of the dataset input along with a message from a discrete set, and (3) a hybrid between both approaches motivated by a new Sample Compress theorem handling continuous messages. The latter theorem exploits the pivotal information transiting at the encoder-decoder junction in order to compute generalization guarantees for each downstream predictor obtained by our meta-learning scheme.