Publications

Consistent Synthetic Sequences Unlock Structural Diversity in Fully Atomistic De Novo Protein Design

Danny Reidenbach

Zhonglin Cao

Zuobai Zhang

Kieran Didi

Tomas Geffner

Guoqing Zhou

Jian Tang

Christian Dallago

Arash Vahdat

Emine Kucukbenli

Karsten Kreis

High-quality training datasets are crucial for the development of effective protein design models, but existing synthetic datasets often inc… (see more)lude unfavorable sequence-structure pairs, impairing generative model performance. We leverage ProteinMPNN, whose sequences are experimentally favorable as well as amenable to folding, together with structure prediction models to align high-quality synthetic structures with recoverable synthetic sequences. In that way, we create a new dataset designed specifically for training expressive, fully atomistic protein generators. By retraining La-Proteína, which models discrete residue type and side chain structure in a continuous latent space, on this dataset, we achieve new state-of-the-art results, with improvements of +54% in structural diversity and +27% in co-designability. To validate the broad utility of our approach, we further introduce Proteína-Atomística, a unified flow-based framework that jointly learns the distribution of protein backbone structure, discrete sequences, and atomistic side chains without latent variables. We again find that training on our new sequence-structure data dramatically boosts benchmark performance, improving Proteína-Atomística’s structural diversity by +73% and co-designability by +5%. Our work highlights the critical importance of aligned sequence-structure data for training high-performance de novo protein design models. All data will be publicly released.

2025-09-23

NeurIPS.cc/2025/Workshop/AI4Science (poster)

openreview.net

Localized-Attention-Guided Concept Erasure for Text-to-Image Diffusion Models

Zhuan Shi

Alireza Dehghanpour Farashah

Rik de Vries

Golnoosh Farnadi

2025-09-23

NeurIPS.cc/2025/Workshop/GenProCC (published)

openreview.net

Source-free cross-modality medical image synthesis with diffusion priors

Jia Chen

Xin Wang

Jun Bai

Kai Yang

Xinrong Hu

Yue Li

2025-09-23

Journal of King Saud University Computer and Information Sciences (published)

doi.org

Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs

Ziling Cheng

Meng Cao

Marc-Antoine Rondeau

Jackie CK Cheung

The widespread success of LLMs on NLP benchmarks has been accompanied by concerns that LLMs function primarily as stochastic parrots that re… (see more)produce texts similar to what they saw during pre-training, often erroneously. But what is the nature of their errors, and do these errors exhibit any regularities? In this work, we examine irrelevant context hallucinations, in which models integrate misleading contextual cues into their predictions. Through behavioral analysis, we show that these errors result from a structured yet flawed mechanism that we term _class-based (mis)generalization_, in which models combine abstract class cues with features extracted from the query or context to derive answers. Furthermore, mechanistic interpretability experiments on Llama-3, Mistral, and Pythia across 39 factual recall relation types reveal that this behavior is reflected in the model's internal computations: (i) abstract class representations are constructed in lower layers before being refined into specific answers in higher layers, (ii) feature selection is governed by two competing circuits --- one prioritizing direct query-based reasoning, the other incorporating contextual cues --- whose relative influences determine the final output. Our findings provide a more nuanced perspective on the stochastic parrot argument: through form-based training, LLMs can exhibit generalization leveraging abstractions, albeit in unreliable ways based on contextual cues — what we term _stochastic chameleons_.

2025-09-23

colmweb.org/COLM/2025/Workshop/INTERPLAY (published)

openreview.net

Beyond Naive Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs

Arjun Ashok

Andrew Robert Williams

Vincent Zhihao Zheng

Irina Rish

Nicolas Chapados

Étienne Marcotte

Valentina Zantedeschi

Alexandre Drouin

Forecasting in real-world settings requires models to integrate not only historical data but also relevant contextual information, often ava… (see more)ilable in textual form. While recent work has shown that large language models (LLMs) can be effective context-aided forecasters via naïve direct prompting, their full potential remains underexplored. We address this gap with 4 strategies, providing new insights into the zero-shot capabilities of LLMs in this setting. ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model's reasoning over the context independently from its forecast accuracy. CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines. IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models. Finally, RouteDP optimizes resource efficiency by using LLMs to estimate task difficulty, and routing the most challenging tasks to larger models. Evaluated on different kinds of context-aided forecasting tasks from the CiK benchmark, our strategies demonstrate distinct benefits over naïve prompting across LLMs of different sizes and families. These results open the door to further simple yet effective improvements in LLM-based context-aided forecasting.

2025-09-22

BERT2S @ Neural Information Processing Systems (poster)

doi.org

openreview.net

Conscious Data Contribution via Community-Driven Chain-of-Thought Distillation

Lena Libon

Meghana Bhange

Rushabh Solanki

Elliot Creager

Ulrich Matchi Aïvodji

2025-09-22

NeurIPS.cc/2025/Workshop/ACA (oral)

openreview.net

Context-Aware World Models for Task-Agnostic Control

Busra Tugce Gurbuz

Hafez Ghaemi

Christopher C. Pack

Shahab Bakhtiari

Eilif Benjamin Muller

2025-09-22

NeurIPS.cc/2025/Workshop/UniReps (published)

openreview.net

Crowding Out The Noise: Algorithmic Collective Action Under Differential Privacy

Rushabh Solanki

Meghana Bhange

Ulrich Matchi Aïvodji

Elliot Creager

The integration of AI into daily life has generated considerable attention and excitement, while also raising concerns about automating algo… (see more)rithmic harms and re-entrenching existing social inequities. While top-down solutions such as regulatory policies and improved algorithm design are common, the fact that AI trains on social data creates an opportunity for a grassroots approach, Algorithmic Collective Action, where users deliberately modify the data they share to steer a platform's learning process in their favor. This paper considers how these efforts interact with a firm's use of a differentially private model to protect user data, motivated by the growing regulatory focus on privacy and data protection. In particular, we investigate how the use of Differentially Private Stochastic Gradient Descent (DPSGD) affects the collective’s ability to influence the learning process. Our findings show that while differential privacy contributes to the protection of individual data, it introduces challenges for effective algorithmic collective action. We characterize lower bounds on the success of these actions as a function of the collective's size and the firm's privacy parameters, verifying these trends experimentally by training deep neural network classifiers across several datasets.

2025-09-22

NeurIPS.cc/2025/Workshop/ACA (poster)

openreview.net

FEval-TTC: Fair Evaluation Protocol for Test-Time Compute

Pavel Rumiantsev

Soumyasundar Pal

Yingxue Zhang

Mark J. Coates

The performance of Large Language Models (LLMs) and the associated dollar costs of API calls can fluctuate over time, potentially invalidati… (see more)ng conclusions drawn in prior research. To address this, we propose a _**F**air **Eval**uation protocol for **T**est-**T**ime **C**ompute_ (FEval-TTC), designed to ensure consistent assessment of test-time compute (TTC) methods, regardless of such fluctuations. FEval-TTC focuses on evaluation of TTC methods that utilize underlying Chains-of-Thought (CoT). It supports evaluations across multiple LLMs on a diverse set of mathematical and commonsense reasoning datasets. The few-shot prompting and answer extraction processes are standardized across datasets, reducing both time and monetary overhead for researchers. Furthermore, we provide a cost modeling procedure that estimates both the token and dollar cost per query, facilitating equitable comparisons of prevalent TTC methods. We open-source FEval-TTC for public use at [anonymized code link](https://drive.google.com/file/d/1DUeteFA7lnx5MubuR0lh6OPN6XKfpqGC/view?usp=sharing).

2025-09-22

NeurIPS.cc/2025/Workshop/LLM_Evaluation (poster)

doi.org

openreview.net

How to Get Your LLM to Generate Challenging Problems for Evaluation

Arkil Patel

Siva Reddy

Dzmitry Bahdanau

The pace of evolution of Large Language Models (LLMs) necessitates new approaches for rigorous and comprehensive evaluation. Traditional hum… (see more)an annotation is increasingly impracticable due to the complexities and costs involved in generating high-quality, challenging problems, particularly for tasks such as long-context reasoning. Moreover, the rapid saturation of existing human-curated benchmarks by LLMs further necessitates the need to develop scalable and automatically renewable evaluation methodologies. In this work, we introduce **CHASE**, a unified framework to synthetically generate challenging problems using LLMs without human involvement. For a given task, our approach builds a hard problem in a bottom-up manner from simpler components. Moreover since we want to generate synthetic data for evaluation, our framework decomposes the generation process into independently verifiable sub-tasks, thereby ensuring a high level of quality and correctness. We implement CHASE to create evaluation benchmarks across three diverse domains: document-based question answering, repository-level code completion, and math reasoning. The performance of state-of-the-art LLMs on these synthetic benchmarks lies in the range of 40-60\% accuracy, thereby demonstrating the effectiveness of our framework at generating hard problems. Our experiments further reveal that the Gemini models significantly outperform other LLMs at long-context reasoning, and that the performance of all LLMs drastically drops by as much as 70\% when we scale up the context size to 50k tokens.

2025-09-22

NeurIPS.cc/2025/Workshop/LLM_Evaluation (poster)

doi.org

openreview.net

Inferring dynamical features from neural data through joint learning of latents factors and weights

Anirudh Gururaj Jamkhandi

Ali Korojy

Olivier Codol

Guillaume Lajoie

Matthew G Perich

Behavior arises from coordinated synaptic changes in recurrent neural populations. Inferring the underlying dynamics from limited, noisy, an… (see more)d high-dimensional neural recordings is a major challenge, as experimental data often provide only partial access to brain states. While data-driven recurrent neural networks (dRNNs) have been effective for modeling such dynamics, they are typically limited to single-task domains and struggle to generalize across behavioral conditions. Here, we propose a hierachical model that captures neural dynamics across multiple behavioral contexts by learning a shared embedding space over RNN weights. We demonstrate that our model captures diverse neural dynamics with a single, unified model using both simulated datasets of many tasks and neural recordings during monkey reaching. Using the learned task embeddings, we demonstrate accurate classification of dynamical regimes and generalization to unseen samples. Crucially, spectral analysis on the learnt weights provide valuable insights into network computations, highlighting the potential of joint embedding–weight learning for scalable inference of brain dynamics.

2025-09-22

NeurIPS.cc/2025/Workshop/NeurReps (poster)

openreview.net

Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs

Stefan Horoi

Sangwoo Cho

Supriyo Chakraborty

Shi-Xiong Zhang

Sambit Sahu

Guy Wolf

Genta Indra Winata

2025-09-22

NeurIPS.cc/2025/Workshop/UniReps (published)

openreview.net

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications