Publications

Autoregressive Speech Enhancement via Acoustic Tokens

Luca Della Libera

Yusuf Cem Sübakan

Mirco Ravanaelli

2025-07-16

ArXiv (preprint)

doi.org

arxiv.org

Convergence of regularized agent-state based Q-learning in POMDPs

Amit Sinha

Matthieu Geist

Aditya Mahajan

In this paper, we present a framework to understand the convergence of commonly used Q-learning reinforcement learning algorithms in practic… (see more)e. Two salient features of such algorithms are: (i) the Q-table is recursively updated using an agent state (such as the state of a recurrent neural network) which is not a belief state or an information state and (ii) policy regularization is often used to encourage exploration and stabilize the learning algorithm. We investigate the simplest form of such Q-learning algorithms which we call regularized agent-state based Q-learning (RASQL) and show that it converges under mild technical conditions to the fixed point of an appropriately defined regularized MDP, which depends on the stationary distribution induced by the behavioral policy. We also show that a similar analysis continues to work for a variant of RASQL that learns periodic policies. We present numerical examples to illustrate that the empirical convergence behavior matches with the proposed theoretical limit.

2025-07-16

EWRL/2025/Workshop (poster)

openreview.net

Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images

Zahra Tehrani Nasab

Hujun Ni

Amar Kumar

Tal Arbel

2025-07-16

ArXiv (preprint)

doi.org

arxiv.org

Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs

Shangmin Guo

Omar Darwiche Domingues

Raphaël Avalos

Aaron Courville

Florian Strub

Tool use in stateful environments presents unique challenges for large language models (LLMs), where existing test-time compute strategies r… (see more)elying on repeated trials in the environment are impractical. We propose dynamics modelling (DyMo), a method that augments LLMs with a state prediction capability alongside function calling during post-training. This enables LLMs to predict the future states of their actions through an internal environment model. On the Berkeley Function Calling Leaderboard V2, DyMo improves success rates and significantly reduces hallucinations. We further integrate the internal environment model into self-verification sampling (SVS), and show that this substantially improves pass^k over number of trials k, and allows the model to refuse unreliable outputs. Together, DyMo and SVS greatly enhance the effectiveness and reliability of LLMs for tool use. We believe this work charts a path towards scalable planning RL methods for LLM inference without repeatedly querying the oracle environment.

2025-07-16

EWRL/2025/Workshop (poster)

openreview.net

Brain Age Prediction: Deep Models Need a Hand to Generalize

Reza Rajabli

Mahdie Soltaninejad

Vladimir S. Fonov

Danilo Bzdok

D. Louis Collins

Predicting brain age from T1‐weighted MRI is a promising marker for understanding brain aging and its associated conditions. While deep le… (see more)arning models have shown success in reducing the mean absolute error (MAE) of predicted brain age, concerns about robust and accurate generalization in new data limit their clinical applicability. The large number of trainable parameters, combined with limited medical imaging training data, contributes to this challenge, often resulting in a generalization gap where there is a significant discrepancy between model performance on training data versus unseen data. In this study, we assess a deep model, SFCN‐reg, based on the VGG‐16 architecture, and address the generalization gap through comprehensive preprocessing, extensive data augmentation, and model regularization. Using training data from the UK Biobank, we demonstrate substantial improvements in model performance. Specifically, our approach reduces the generalization MAE by 47% (from 5.25 to 2.79 years) in the Alzheimer's Disease Neuroimaging Initiative dataset and by 12% (from 4.35 to 3.75 years) in the Australian Imaging, Biomarker and Lifestyle dataset. Furthermore, we achieve up to 13% reduction in scan‐rescan error (from 0.80 to 0.70 years) while enhancing the model's robustness to registration errors. Feature importance maps highlight anatomical regions used to predict age. These results highlight the critical role of high‐quality preprocessing and robust training techniques in improving accuracy and narrowing the generalization gap, both necessary steps toward the clinical use of brain age prediction models. Our study makes valuable contributions to neuroimaging research by offering a potential pathway to improve the clinical applicability of deep learning models.

2025-07-15

Human Brain Mapping (published)

doi.org

From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease

Peter Plantinga

Jen-Kai Chen

Roozbeh Sattari

Mirco Ravanaelli

Denise Klein

Speech holds promise as a cost-effective and non-invasive biomarker for neurological conditions such as Parkinson's disease (PD). While deep… (see more) learning systems trained on raw audio can find subtle signals not available from hand-crafted features, their black-box nature hinders clinical adoption. To address this, we apply sparse autoencoders (SAEs) to uncover interpretable internal representations from a speech-based PD detection system. We introduce a novel mask-based activation for adapting SAEs to small biomedical datasets, creating sparse disentangled dictionary representations. These dictionary entries are found to have strong associations with characteristic articulatory deficits in PD speech, such as reduced spectral flux and increased spectral flatness in the low-energy regions highlighted by the model attention. We further show that the spectral flux is related to volumetric measurements of the putamen from MRI scans, demonstrating the potential of SAEs to reveal clinically relevant biomarkers for disease monitoring and diagnosis.

2025-07-15

ArXiv (preprint)

doi.org

arxiv.org

Longer scans boost prediction and cut costs in brain-wide association studies

Leon Qi Rong Ooi

Csaba Orbán

Shaoshi Zhang

Thomas E. Nichols

Trevor Wei Kiat Tan

Ru Kong

Scott Marek

Nico U. F. Dosenbach

Timothy O. Laumann

Evan M. Gordon

Kwong Hsia Yap

Fang Ji

Joanna Su Xian Chong

Christopher Chen

Lijun An

Nicolai Franzmeier

Sebastian N. Roemer-Cassiano

Qingyu Hu

Jianxun Ren

Hesheng Liu … (see 9 more)

Sidhant Chopra

Carrisa V. Cocuzza

Justin T. Baker

Juan Helen Zhou

Danilo Bzdok

Simon B. Eickhoff

Avram J. Holmes

B. T. Thomas Yeo

Clifford R. Jack Jr

A pervasive dilemma in brain-wide association studies (BWAS) is whether to prioritize functional MRI (fMRI) scan time or sample size. We der… (see more)ive a theoretical model showing that individual-level phenotypic prediction accuracy increases with sample size and total scan duration (sample size × scan time per participant). The model explains empirical prediction accuracies extremely well across 76 phenotypes from nine resting-fMRI and task-fMRI datasets (R2 = 0.89), spanning a wide range of scanners, acquisitions, racial groups, disorders and ages. For scans ≤20 mins, prediction accuracy increases linearly with the logarithm of total scan duration, suggesting interchangeability of sample size and scan time. However, sample size is ultimately more important than scan time in determining prediction accuracy. Nevertheless, when accounting for overhead costs associated with each participant (e.g., recruitment costs), to boost prediction accuracy, longer scans can yield substantial cost savings over larger sample size. To achieve high prediction performance, 10-min scans are highly cost inefficient. In most scenarios, the optimal scan time is ≥20 mins. On average, 30-min scans are the most cost-effective, yielding 22% cost savings over 10-min scans. Overshooting is cheaper than undershooting the optimal scan time, so we recommend aiming for ≥30 mins. Compared with resting-state whole-brain BWAS, the most cost-effective scan time is shorter for task-fMRI and longer for subcortical-cortical BWAS. Standard power calculations maximize sample size at the expense of scan time. Our study demonstrates that optimizing both sample size and scan time can boost prediction power while cutting costs. Our empirically informed reference is available for future study planning: WEB_APPLICATION_LINK

2025-07-15

Nature (published)

doi.org

Optimizers Qualitatively Alter Solutions And We Should Leverage This

Razvan Pascanu

Clare Lyle

Ionut-Vlad Modoranu

Naima Elosegui Borras

Dan Alistarh

Petar Veličković

A. Chandar

Soham De

James Martens

Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when us… (see more)ing optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.

2025-07-15

ArXiv (preprint)

doi.org

arxiv.org

Aligning Protein Conformation Ensemble Generation with Physical Feedback

Jiarui Lu

Xiaoyin Chen

Stephen Z. Lu

Aurelie Lozano

Vijil Chenthamarakshan

Payel Das

Jian Tang

Protein dynamics play a crucial role in protein biological functions and properties, and their traditional study typically relies on time-co… (see more)nsuming molecular dynamics (MD) simulations conducted in silico. Recent advances in generative modeling, particularly denoising diffusion models, have enabled efficient accurate protein structure prediction and conformation sampling by learning distributions over crystallographic structures. However, effectively integrating physical supervision into these data-driven approaches remains challenging, as standard energy-based objectives often lead to intractable optimization. In this paper, we introduce Energy-based Alignment (EBA), a method that aligns generative models with feedback from physical models, efficiently calibrating them to appropriately balance conformational states based on their energy differences. Experimental results on the MD ensemble benchmark demonstrate that EBA achieves state-of-the-art performance in generating high-quality protein ensembles. By improving the physical plausibility of generated structures, our approach enhances model predictions and holds promise for applications in structural biology and drug discovery.

2025-07-14

International Conference on Machine Learning (Accept (poster))

doi.org

proceedings.mlr.press

Exact risk curves of signSGD in High-Dimensions: quantifying preconditioning and noise-compression effects

Ke Liang Xiao

Noah Marshall

Atish Agarwala

Elliot Paquette

In recent years, signSGD has garnered interest as both a practical optimizer as well as a simple model to understand adaptive optimizers lik… (see more)e Adam. Though there is a general consensus that signSGD acts to precondition optimization and reshapes noise, quantitatively understanding these effects in theoretically solvable settings remains difficult. We present an analysis of signSGD in a high dimensional limit, and derive a limiting SDE and ODE to describe the risk. Using this framework we quantify four effects of signSGD: effective learning rate, noise compression, diagonal preconditioning, and gradient noise reshaping. Our analysis is consistent with experimental observations but moves beyond that by quantifying the dependence of these effects on the data and noise distributions. We conclude with a conjecture on how these results might be extended to Adam.

2025-07-14

International Conference on Machine Learning (Accept (poster))

doi.org

proceedings.mlr.press

In-context learning and Occam's razor

A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees fo… (see more)r generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

2025-07-14

International Conference on Machine Learning (Accept (poster))

doi.org

proceedings.mlr.press

Leveraging Per-Instance Privacy for Machine Unlearning

Nazanin Mohammadi Sepahvand

Anvith Thudi

Berivan Isik

Ashmita Bhattacharyya

Nicolas Papernot

Eleni Triantafillou

Daniel M. Roy