Publications

Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment

Junqi Liu

Xiaoyang Hou

Xin Liu

Zhi Yang

Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, dev… (see more)elopability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc mutation, inference-time biasing, or retraining on property-specific subsets, yet they are target dependent and demand substantial domain expertise or careful hyperparameter tuning. In this paper, we introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity. ProtAlign employs a semi-online Direct Preference Optimization strategy with a flexible preference margin to mitigate conflicts among competing objectives and constructs preference pairs using in silico property predictors. Applied to the widely used ProteinMPNN backbone, the resulting model MoMPNN enhances developability without compromising designability across tasks including sequence design for CATH 4.3 crystal structures, de novo generated backbones, and real-world binder design scenarios, making it an appealing framework for practical protein sequence design.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality.

Karan Bali

Jack Stanley

Praneet Suresh

Danilo Bzdok

In mechanistic interpretability, recent work scrutinizes transformer"circuits"- sparse, mono or multi layer sub computations, that may refle… (see more)ct human understandable functions. Yet, these network circuits are rarely acid-tested for their stability across different instances of the same deep learning architecture. Without this, it remains unclear whether reported circuits emerge universally across labs or turn out to be idiosyncratic to a particular estimation instance, potentially limiting confidence in safety-critical settings. Here, we systematically study stability across-refits in increasingly complex transformer language models of various sizes. We quantify, layer by layer, how similarly attention heads learn representations across independently initialized training runs. Our rigorous experiments show that (1) middle-layer heads are the least stable yet the most representationally distinct; (2) deeper models exhibit stronger mid-depth divergence; (3) unstable heads in deeper layers become more functionally important than their peers from the same layer; (4) applying weight decay optimization substantially improves attention-head stability across random model initializations; and (5) the residual stream is comparatively stable. Our findings establish the cross-instance robustness of circuits as an essential yet underappreciated prerequisite for scalable oversight, drawing contours around possible white-box monitorability of AI systems.

2025-12-31

arXiv (preprint)

doi.org

arxiv.org

RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference

Lianming Huang

Shangyu Wu

Yufei Cui

Ying Xiong

Haibo Hu

Xue Liu

Tei-Wei Kuo

Nan Guan

Chun Jason Xue

Deploying large language model inference remains challenging due to their high computational overhead. Early exit optimizes model inference … (see more)by adaptively reducing the number of inference layers. Current methods typically train internal classifiers or use heuristic methods to determine the exit layer. However, those methods either introduce significant training overheads or lead to performance degradation. To address these limitations, this paper proposes RAEE, a robust Retrieval-Augmented Early Exit framework that not only enables early exit but also enhances model performance through corrective exit information at intermediate layers. This paper first demonstrates that the early exit problem can be effectively modeled as a distribution prediction problem, in which the distribution can be further approximated through the exit information of similar data. Subsequently, this paper introduces the process of collecting exit information of correct predictions and the steps to construct the retrieval database. Finally, leveraging the pre-constructed retrieval database, RAEE utilizes the exit information from retrieved similar data to guide the backbone model's exit. Experimental results demonstrate that RAEE can not only accelerate inference while achieving robust zero-shot performance across eight downstream tasks.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

Recall, Robustness, and Lexicographic Evaluation

Fernando Diaz

Michael D. Ekstrand

Bhaskar Mitra

2025-12-31

Trans. Recomm. Syst. (published)

doi.org

arxiv.org

RECODE: A Benchmark for Research Code DEvelopment with Interactive Human Feedback

Chunyu Miao

Henry Peng Zou

Yangning Li

Yankai Chen

Yibo Wang

Fangxin Wang

Yifan Li

Wooseong Yang

Bowei He

Xinni Zhang

Dianzhi Yu

Hanchen Yang

Hoang H Nguyen

Yue Zhou

Jie Yang

Jizhou Guo

Wenzhe Fan

Chin-Yuan Yeh

Panpan Meng

Liancheng Fang … (see 11 more)

Jinhu Qi

Wei-Chieh Huang

Zhengyao Gu

Yuwei Han

Langzhou He

Yuyao Yang

Yinghui Li

Hai-Tao Zheng

Xue Liu

Irwin King

Philip S. Yu

Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and ex… (see more)ecutable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature of realistic workflows of scientific research development. To address this gap, we present RECODE, a benchmark of 102 tasks from research papers and repositories that evaluates LLMs through multi-turn interactions with human feedback. It includes structured instructions, unit tests, and a five-level feedback hierarchy to reflect realistic researcher–agent collaboration. We further present ReCodeAgent, a framework that integrates feedback into iterative code generation. Experimentswith leading LLMs, including GPT-5, Claude-Sonnet-4, DeepSeek-V3.1, and Gemini 2.5, show substantial performance gains with richer feedback, while also highlighting ongoing challenges in the generation of complex research code. RECODE establishes a foundation for developing adaptive, feedback-driven LLM agents in scientific research implementation.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

openreview.net

Reply to comment on "medication-based mortality prediction in COPD using machine learning and conventional statistical methods".

Ana Paula Pena-Gralle

Amélie Forget

Yohann Moanahere Chiu

Marc-André Legault

M. Beauchesne

Lucie Blais

2025-12-31

International Journal of Medical Informatics (published)

doi.org

Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling

Ola Ahmad

Frédéric Precioso

Fine-tuning pretrained models is a standard and effective workflow in modern machine learning. However, robust fine-tuning (RFT), which aims… (see more) to simultaneously achieve adaptation to a downstream task and robustness to adversarial examples, remains challenging. Despite the abundance of non-robust pretrained models in open-source repositories, their potential for RFT is less understood. We address this knowledge gap by systematically examining RFT from such non-robust models. Our experiments reveal that fine-tuning non-robust models with a robust objective, even under small perturbations, can lead to poor performance, a phenomenon that we dub _suboptimal transfer_. In challenging scenarios (eg, difficult tasks, high perturbation), the resulting performance can be so low that it may be considered a transfer failure. We find that fine-tuning using a robust objective impedes task adaptation at the beginning of training and eventually prevents optimal transfer. However, we propose a novel heuristic, _Epsilon-Scheduling_, a schedule over perturbation strength used during training that promotes optimal transfer. Additionally, we introduce _expected robustness_, a metric that captures performance across a range of perturbations, providing a more comprehensive evaluation of the accuracy-robustness trade-off of diverse models at test-time. Extensive experiments on wide range of configurations (six pretrained models and five datasets) show that _Epsilon-Scheduling_ successfully prevents _suboptimal transfer_ and consistently improves expected robustness.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

Kieran Didi

Zuobai Zhang

Guoqing Zhou

Danny Reidenbach

Zhonglin Cao

Sooyoung Cha

Tomas Geffner

Christian Dallago

Jian Tang

Michael Bronstein

Martin Steinegger

Emine Kucukbenli

Arash Vahdat

Karsten Kreis

Protein interaction modeling is central to protein design, which has been transformed by machine learning with broad applications in drug di… (see more)scovery and beyond. In this landscape, structure-based de novo binder design is most often cast as either conditional generative modeling or sequence optimization via structure predictors ("hallucination"). We argue that this is a false dichotomy and propose Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architecture and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this generative prior, unifying the strengths of previously distinct generative and hallucination methods. Complexa sets a new state of the art in computational binder design benchmarks: it delivers markedly higher in-silico success rates than existing generative approaches, and our novel test-time optimization strategies greatly outperform previous hallucination methods under normalized compute budgets. We further demonstrate explicit interface hydrogen bond optimization, fold class-guided binder generation, and extensions to small molecule targets and enzyme design tasks, again surpassing prior methods. Code, models and new data will be publicly released.

2025-12-31

International Conference on Learning Representations (Accept (Oral))

openreview.net

SelvaBox: A high‑resolution dataset for tropical tree crown detection

Hugo Baudchon

Arthur Ouaknine

Martin Weiss

Mélisande Teng

Thomas Walla

Antoine Caron-Guay

Christopher Pal

Étienne Laliberté

Detecting individual tree crowns in tropical forests is essential to study these complex and crucial ecosystems impacted by human interventi… (see more)ons and climate change. However, tropical crowns vary widely in size, structure, and pattern and are largely overlapping and intertwined, requiring advanced remote sensing methods applied to high-resolution imagery. Despite growing interest in tropical tree crown detection, annotated datasets remain scarce, hindering robust model development. We introduce SelvaBox, the largest open‑access dataset for tropical tree crown detection in high-resolution drone imagery. It spans three countries and contains more than

2025-12-31

International Conference on Learning Representations (Accept (Poster))

openreview.net

Set Representation Auxiliary Learning with Adversarial Encoding Perturbation and Optimization

Yankai Chen

Xinni Zhang

Henry Peng Zou

Bowei He

Yangning Li

Philip S. Yu

Irwin King

Xue Liu

Sets are a fundamental data structure, and learning their vectorized representations is crucial for many computational problems. Existing me… (see more)thods typically focus on intra-set properties such as permutation invariance and cardinality independence. While effective at preserving basic intra-set semantics, these approaches may be insufficient in explicitly modeling inter-set correlations, which are critical for tasks requiring fine-grained comparisons between sets. In this work, we propose SRAL, a Set Representation Auxiliary Learning framework for capturing inter-set correlations that is compatible with various downstream tasks. SRAL conceptualizes sets as high-dimensional distributions and leverages the 2-Sliced-Wasserstein distance to derive their distributional discrepancies into set representation encoding. More importantly, we introduce a novel adversarial auxiliary learning scheme. Instead of manipulating the input data, our method perturbs the set encoding process itself and compels the model to be robust against worst-case perturbations through a min-max optimization. Our theoretical analysis shows that this objective, in expectation, directly optimizes for the set-wise Wasserstein distances, forcing the model to learn highly discriminative representations. Comprehensive evaluations across four downstream tasks examine SRAL’s performance relative to baseline methods, showing consistent effectiveness in both inter-set relation-sensitive retrieval and intra-set information-oriented processing tasks.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

openreview.net

SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration

Kaustubh Mani

Yann Pequignot

Vincent Mai

Liam Paull

Safe exploration is a prerequisite for deploying reinforcement learning (RL) agents in safety-critical domains. In this paper, we approach s… (see more)afe exploration through the lens of epistemic uncertainty, where the actor’s sensitivity to parameter perturbations serves as a practical proxy for regions of high uncertainty. We propose Sharpness-Aware Policy Optimization (SHAPO), a sharpness-aware policy update rule that evaluates gradients at perturbed parameters, making policy updates pessimistic with respect to the actor’s epistemic uncertainty. Analytically we show that this adjustment implicitly reweighs policy gradients, amplifying the influence of rare unsafe actions while tempering contributions from already safe ones, thereby biasing learning toward conservative behavior in under-explored regions. Across several continuous-control tasks, our method consistently improves both safety and task performance over existing baselines, significantly expanding their Pareto frontiers.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

openreview.net

Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation

Arina Kharlamova

Bowei He

Chen Ma

Xue Liu

Online services rely on CAPTCHAs as a first line of defense against automated abuse, yet recent advances in multi-modal large language model… (see more)s (MLLMs) have eroded the effectiveness of conventional designs that focus on text recognition or 2D image understanding. To address this challenge, we present **Spatial CAPTCHA**, a novel human-verification framework that leverages fundamental differences in spatial reasoning between humans and MLLMs. Unlike existing CAPTCHAs that rely on low-level perception tasks vulnerable to modern AI, Spatial CAPTCHA generates dynamic questions requiring geometric reasoning, perspective-taking, occlusion handling, and mental rotation—skills intuitive for humans but difficult for current AI systems. The system employs a procedural generation pipeline with constraint-based difficulty control, automated correctness verification, and human-in-the-loop validation to ensure scalability, robustness, and adaptability. Evaluation on a corresponding benchmark, **Spatial-CAPTCHA-Bench**, demonstrates that humans vastly outperform 10 state-of-the-art MLLMs, with the best model achieving only 31.0\% Pass@1 accuracy. Result comparison with Google reCAPTCHA further confirms the effectiveness of Spatial CAPTCHA as both a security mechanism and a diagnostic tool for spatial reasoning in AI.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications