Publications

RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference
Lianming Huang
Shangyu Wu
Yufei Cui
Ying Xiong
Haibo Hu
Xue Liu
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
Deploying large language model inference remains challenging due to their high computational overhead. Early exit optimizes model inference … (see more)by adaptively reducing the number of inference layers. Current methods typically train internal classifiers or use heuristic methods to determine the exit layer. However, those methods either introduce significant training overheads or lead to performance degradation. To address these limitations, this paper proposes RAEE, a robust Retrieval-Augmented Early Exit framework that not only enables early exit but also enhances model performance through corrective exit information at intermediate layers. This paper first demonstrates that the early exit problem can be effectively modeled as a distribution prediction problem, in which the distribution can be further approximated through the exit information of similar data. Subsequently, this paper introduces the process of collecting exit information of correct predictions and the steps to construct the retrieval database. Finally, leveraging the pre-constructed retrieval database, RAEE utilizes the exit information from retrieved similar data to guide the backbone model's exit. Experimental results demonstrate that RAEE can not only accelerate inference while achieving robust zero-shot performance across eight downstream tasks.
Recall, Robustness, and Lexicographic Evaluation
Michael D. Ekstrand
Bhaskar Mitra
RECODE: A Benchmark for Research Code DEvelopment with Interactive Human Feedback
Chunyu Miao
Henry Peng Zou
Yangning Li
Yankai Chen
Yibo Wang
Fangxin Wang
Yifan Li
Wooseong Yang
Bowei He
Xinni Zhang
Dianzhi Yu
Hanchen Yang
Hoang H Nguyen
Yue Zhou
Jie Yang
Jizhou Guo
Wenzhe Fan
Chin-Yuan Yeh
Panpan Meng
Liancheng Fang … (see 11 more)
Jinhu Qi
Wei-Chieh Huang
Zhengyao Gu
Yuwei Han
Langzhou He
Yuyao Yang
Yinghui Li
Hai-Tao Zheng
Xue Liu
Irwin King
Philip S. Yu
Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and ex… (see more)ecutable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature of realistic workflows of scientific research development. To address this gap, we present RECODE, a benchmark of 102 tasks from research papers and repositories that evaluates LLMs through multi-turn interactions with human feedback. It includes structured instructions, unit tests, and a five-level feedback hierarchy to reflect realistic researcher–agent collaboration. We further present ReCodeAgent, a framework that integrates feedback into iterative code generation. Experimentswith leading LLMs, including GPT-5, Claude-Sonnet-4, DeepSeek-V3.1, and Gemini 2.5, show substantial performance gains with richer feedback, while also highlighting ongoing challenges in the generation of complex research code. RECODE establishes a foundation for developing adaptive, feedback-driven LLM agents in scientific research implementation.
Reply to comment on "medication-based mortality prediction in COPD using machine learning and conventional statistical methods".
Ana Paula Pena-Gralle
Amélie Forget
Yohann Moanahere Chiu
M. Beauchesne
Lucie Blais
RetINaBox: A Hands-On Learning Tool for Experimental Neuroscience
Brune Bettler
Flavia Arias Armas
Vanessa Bordonaro
Megan Q. Liu
Mingyu Wan
Aude Villemain
Blake A. Richards
Stuart Trenholm
An exciting aspect of neuroscience is developing and testing hypotheses via experimentation. However, due to logistical and financial hurdle… (see more)s, the experiment and discovery component of neuroscience is generally lacking in classroom and outreach settings. To address this issue, here we introduce RetINaBox: a low-cost open–source electronic visual system simulator that provides users with a hands-on tool to discover how the visual system builds feature detectors. RetINaBox includes an LED array for generating visual stimuli and photodiodes that act as an array of model photoreceptors. Custom software on a Raspberry Pi computer reads out responses from model photoreceptors and allows users to control the polarity and delay of the signal transfer from model photoreceptors to model retinal ganglion cells. Interactive lesson plans are provided, guiding users to discover different types of visual feature detectors—including ON/OFF, center-surround, orientation-selective, and direction-selective receptive fields—as well as their underlying circuit computations.
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
Fine-tuning pretrained models is a standard and effective workflow in modern machine learning. However, robust fine-tuning (RFT), which aims… (see more) to simultaneously achieve adaptation to a downstream task and robustness to adversarial examples, remains challenging. Despite the abundance of non-robust pretrained models in open-source repositories, their potential for RFT is less understood. We address this knowledge gap by systematically examining RFT from such non-robust models. Our experiments reveal that fine-tuning non-robust models with a robust objective, even under small perturbations, can lead to poor performance, a phenomenon that we dub _suboptimal transfer_. In challenging scenarios (eg, difficult tasks, high perturbation), the resulting performance can be so low that it may be considered a transfer failure. We find that fine-tuning using a robust objective impedes task adaptation at the beginning of training and eventually prevents optimal transfer. However, we propose a novel heuristic, _Epsilon-Scheduling_, a schedule over perturbation strength used during training that promotes optimal transfer. Additionally, we introduce _expected robustness_, a metric that captures performance across a range of perturbations, providing a more comprehensive evaluation of the accuracy-robustness trade-off of diverse models at test-time. Extensive experiments on wide range of configurations (six pretrained models and five datasets) show that _Epsilon-Scheduling_ successfully prevents _suboptimal transfer_ and consistently improves expected robustness.
Scalable Tree Ensemble Proximities in Python
Kevin R. Moon
Jake S. Rhodes
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
Kieran Didi
Guoqing Zhou
Danny Reidenbach
Zhonglin Cao
Sooyoung Cha
Tomas Geffner
Christian Dallago
Michael Bronstein
Martin Steinegger
Emine Kucukbenli
Arash Vahdat
Karsten Kreis
Protein interaction modeling is central to protein design, which has been transformed by machine learning with broad applications in drug di… (see more)scovery and beyond. In this landscape, structure-based de novo binder design is most often cast as either conditional generative modeling or sequence optimization via structure predictors ("hallucination"). We argue that this is a false dichotomy and propose Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architecture and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this generative prior, unifying the strengths of previously distinct generative and hallucination methods. Complexa sets a new state of the art in computational binder design benchmarks: it delivers markedly higher in-silico success rates than existing generative approaches, and our novel test-time optimization strategies greatly outperform previous hallucination methods under normalized compute budgets. We further demonstrate explicit interface hydrogen bond optimization, fold class-guided binder generation, and extensions to small molecule targets and enzyme design tasks, again surpassing prior methods. Code, models and new data will be publicly released.
SelvaBox: A high‑resolution dataset for tropical tree crown detection
Detecting individual tree crowns in tropical forests is essential to study these complex and crucial ecosystems impacted by human interventi… (see more)ons and climate change. However, tropical crowns vary widely in size, structure, and pattern and are largely overlapping and intertwined, requiring advanced remote sensing methods applied to high-resolution imagery. Despite growing interest in tropical tree crown detection, annotated datasets remain scarce, hindering robust model development. We introduce SelvaBox, the largest open‑access dataset for tropical tree crown detection in high-resolution drone imagery. It spans three countries and contains more than
Set Representation Auxiliary Learning with Adversarial Encoding Perturbation and Optimization
Yankai Chen
Xinni Zhang
Henry Peng Zou
Bowei He
Yangning Li
Philip S. Yu
Irwin King
Xue Liu
Sets are a fundamental data structure, and learning their vectorized representations is crucial for many computational problems. Existing me… (see more)thods typically focus on intra-set properties such as permutation invariance and cardinality independence. While effective at preserving basic intra-set semantics, these approaches may be insufficient in explicitly modeling inter-set correlations, which are critical for tasks requiring fine-grained comparisons between sets. In this work, we propose SRAL, a Set Representation Auxiliary Learning framework for capturing inter-set correlations that is compatible with various downstream tasks. SRAL conceptualizes sets as high-dimensional distributions and leverages the 2-Sliced-Wasserstein distance to derive their distributional discrepancies into set representation encoding. More importantly, we introduce a novel adversarial auxiliary learning scheme. Instead of manipulating the input data, our method perturbs the set encoding process itself and compels the model to be robust against worst-case perturbations through a min-max optimization. Our theoretical analysis shows that this objective, in expectation, directly optimizes for the set-wise Wasserstein distances, forcing the model to learn highly discriminative representations. Comprehensive evaluations across four downstream tasks examine SRAL’s performance relative to baseline methods, showing consistent effectiveness in both inter-set relation-sensitive retrieval and intra-set information-oriented processing tasks.
Sex Classification Based on the Functional Connectivity Patterns of the Language Network: A Resting State <scp>fMRI</scp> Study
Xanthy Lajoie
C. DeRoy
C. Bedetti
Bérengère Houzé
N. Clarke
Sébastien Hétu
M.‐È. Picard
S. M. Brambati
ABSTRACT Research on sex differences in the brain is essential for a better understanding of how the brain develops and ages, and how neurol… (see more)ogical and psychiatric conditions can impact men and women differently. While numerous studies have focused on sex differences in brain structures, few have examined the characteristics of functional networks, particularly the language network. Although previous research suggests similar overall language performance across sexes, differences may still exist in the brain networks that underlie language processing. In addition, prior studies on sex differences in language have predominantly relied on task‐based fMRI, which may fail to capture subtle differences in underlying functional activity. In this study, we applied a machine learning approach to classify participants' sex based on resting‐state functional connectivity patterns of the language network in healthy young adults (270 men and 288 women; age: 22–36 years), and to identify the most predictive functional connectivity features. The classifier achieved 91.3% accuracy, with key discriminant features anchored to the left opercular part of the inferior frontal gyrus, the left planum temporale, and the left anterior middle temporal gyrus. These regions show distinctive connectivity patterns with heteromodal association cortices, including the occipital poles, angular gyrus, posterior cingulate gyrus, and intraparietal sulcus. Although there was an overlap between men and women, men displayed stronger functional connectivity values in these regions. These findings highlight sex‐related differences in functional connectivity patterns of the language network at rest, underscoring the importance of considering sex as a variable in future research on language and brain function.
SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
Safe exploration is a prerequisite for deploying reinforcement learning (RL) agents in safety-critical domains. In this paper, we approach s… (see more)afe exploration through the lens of epistemic uncertainty, where the actor’s sensitivity to parameter perturbations serves as a practical proxy for regions of high uncertainty. We propose Sharpness-Aware Policy Optimization (SHAPO), a sharpness-aware policy update rule that evaluates gradients at perturbed parameters, making policy updates pessimistic with respect to the actor’s epistemic uncertainty. Analytically we show that this adjustment implicitly reweighs policy gradients, amplifying the influence of rare unsafe actions while tempering contributions from already safe ones, thereby biasing learning toward conservative behavior in under-explored regions. Across several continuous-control tasks, our method consistently improves both safety and task performance over existing baselines, significantly expanding their Pareto frontiers.