
iWISDM: Assessing instruction following in multimodal models at scale
Xiaoxuan Lei
Lucas Gomez
Hao Yuan Bai
The ability to perform complex tasks from detailed instructions is a key to the remarkable achievements of our species. As humans, we are no… (see more)t only capable of performing a wide variety of tasks but also very complex ones that may entail hundreds or thousands of steps to complete. Large language models and their more recent multimodal counterparts that integrate textual and visual inputs have achieved unprecedented success in performing complex tasks. Yet, most existing benchmarks are largely confined to single-modality inputs — either text or vision — and thus, narrowing the scope of multimodal integration assessments, particularly for instruction-following in multimodal contexts. To bridge this gap, we introduce the instructed-Virtual VISual Decision Making (iWISDM) environment engineered to generate a limitless array of vision-language tasks of varying complexity. Using iWISDM, we compiled three distinct benchmarks of instruction following visual tasks across varying complexity levels and evaluated several newly developed multimodal models on these benchmarks. Our findings establish iWISDM as a robust benchmark for assessing the instructional adherence of both existing and emergent multimodal models and highlight a large gap in these models’ ability to precisely follow instructions.
Learning Generative Population Models From Multiple Clinical Datasets Via Probabilistic Programming
João Loula
Katherine M. Collins
Ulrich Schaechtle
Joshua B. Tenenbaum
Adrian Weller
Feras Saad
Vikash Mansinghka
Accurate, efficient generative models of clinical populations could accelerate clinical research and improve patient outcomes. For example, … (see more)such models could infer probable treatment outcomes for different subpopulations, generate high-fidelity synthetic data that can be shared across organizational boundaries, and discover new relationships among clinical variables. Using Bayesian structure learning, we show that it is possible to learn probabilistic program models of clinical populations by combining data from multiple, sparsely overlapping clinical datasets. Through experiments with multiple clinical trials and real-world evidence from census health surveys, we show that our model generates higher quality synthetic data than neural network baselines, supports more accurate inferences across datasets than traditional statistical methods, and can be queried more efficiently than both, opening up new avenues for accessible and efficient AI assistance in clinical research.
Lost in Translation: The Algorithmic Gap Between LMs and the Brain
Tosato Tommaso
Tikeng Notsawo Pascal Junior
Helbling Saskia
Language Models (LMs) have achieved impressive performance on various linguistic tasks, but their relationship to human language processing … (see more)in the brain remains unclear. This paper examines the gaps and overlaps between LMs and the brain at different levels of analysis, emphasizing the importance of looking beyond input-output behavior to examine and compare the internal processes of these systems. We discuss how insights from neuroscience, such as sparsity, modularity, internal states, and interactive learning, can inform the development of more biologically plausible language models. Furthermore, we explore the role of scaling laws in bridging the gap between LMs and human cognition, highlighting the need for efficiency constraints analogous to those in biological systems. By developing LMs that more closely mimic brain function, we aim to advance both artificial intelligence and our understanding of human cognition.
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Lazar Atanackovic
Xi Zhang
Brandon Amos
Leo J Lee
Alexander Tong
Kirill Neklyudov
Numerous biological and physical processes can be modeled as systems of interacting samples evolving continuously over time, e.g. the dynami… (see more)cs of communicating cells or physical particles. Flow-based models allow for learning these dynamics at the population level --- they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We propose
Neural Ratio Estimators Meet Distributional Shift and Mode Misspecification: A Cautionary Tale from Strong Gravitational Lensing
In recent years, there has been increasing interest in the field of astrophysics in applying Neural Ratio Estimators (NREs) to large-scale i… (see more)nference problems where both amortization and marginalization over a large number of nuisance parameters are needed. Here, in order to assess the true potential of this method to produce unbiased inference on real data, we investigate the robustness of NREs to distribution shifts and model misspecification in the specific scientific application of the measurement of dark matter population-level parameters using strong gravitational lensing. We investigate the behaviour of a trained NRE for test data presenting distributional shifts inside the bounds of training, as well as out of distribution, both in the linear and non-linear parameters of this problem. While our results show that NREs perform when tested perfectly in distribution, we find that they exhibit significant biases and drawbacks when confronted with slight deviations from the examples seen in the training distribution. This indicates the necessity for caution when applying NREs to real astrophysical data, where underlying distributions are not perfectly known and models do not perfectly reconstruct the true underlying distributions.
QGFN: Controllable Greediness with Action Values
Elaine Lau
Stephen Zhewen Lu
Ling Pan
Emmanuel Bengio
Generative Flow Networks (GFlowNets; GFNs) are a family of reward/energy-based generative methods for combinatorial objects, capable of gene… (see more)rating diverse and high-utility samples. However, biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate,
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Joao Monteiro
Pierre-Andre Noel
Étienne Marcotte
Sai Rajeswar
Valentina Zantedeschi
David Vazquez
Perouz Taslakian
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includ… (see more)es encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (e.g., a news article) absent from the internet; (2) a question about the document's topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here:
Revisiting Successor Features for Inverse Reinforcement Learning
Arnav Kumar Jain
Harley Wiltzer
Jesse Farebrother
Sanjiban Choudhury
RGFN: Synthesizable Molecular Generation Using GFlowNets
Michał Koziarski
Andrei Rekesh
Dmytro Shevchuk
Almer M. van der Sloot
Piotr Gaiński
Cheng-Hao Liu
Mike Tyers
Robert A. Batey
On The Local Geometry of Deep Generative Manifolds
Ahmed Imtiaz Humayun
Ibtihel Amara
Candice Schumann
Mohammad Havaei
In this paper, we study theoretically inspired local geometric descriptors of the data manifolds approximated by pre-trained generative mode… (see more)ls. The descriptors – local scaling (ψ), local rank (ν), and local complexity (δ) — characterize the uncertainty, dimensionality, and smoothness on the learned manifold, using only the network weights and architecture. We investigate and emphasize their critical role in understanding generative models. Our analysis reveals that the local geometry is intricately linked to the quality and diversity of generated outputs. Additionally, we see that the geometric properties are distinct for out-of-distribution (OOD) inputs as well as for prompts memorized by Stable Diffusion, showing the possible application of our proposed descriptors for downstream detection and assessment of pre-trained generative models.
TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations
Bo Sun
Thibault Groueix
Chen Song
Qixing Huang
Using neural biomarkers to personalize dosing of vagus nerve stimulation
Antonin Berthon
Lorenz Wernisch
Myrta Stoukidi
Michael Thornton
Olivier Tessier-Lariviere
Pascal Fortier-Poisson
Jorin Mamen
Max Pinkney
Susannah Lee
Elvijs Sarkans
Luca Annecchino
Ben Appleton
Philip Garsed
Bret Patterson
Samuel Gonshaw
Matjaž Jakopec
Sudhakaran Shunmugam
Tristan Edwards
Aleksi Tukiainen
Joel Jennings … (see 3 more)
Emil Hewage
Oliver Armitage