Publications

Logarithmic-time Schedules for Scaling Language Models with Momentum
In practice, the hyperparameters …
Mining Generalizable Activation Functions
Alex Vitvitskyi
Michael Boratko
Matej Grcic
Deep Shah
Position: Capability Control Should be a Separate Goal From Alignment
Shoaib Ahmed Siddiqui
Eleni Triantafillou
David Krueger
Adrian Weller
Foundation models are trained on broad data distributions, yielding generalist capabilities that enable many downstream applications but als… (voir plus)o expand the space of potential misuse and failures. This position paper argues that capability control -- imposing restrictions on permissible model behavior -- should be treated as a distinct goal from alignment. While alignment is often context and preference-driven, capability control aims to impose hard operational limits on permissible behaviors, including under adversarial elicitation. We organize capability control mechanisms across the model lifecycle into three layers: (i) data-based control of the training distribution, (ii) learning-based control via weight- or representation-level interventions, and (iii) system-based control via post-deployment guardrails over inputs, outputs, and actions. Because each layer has characteristic failure modes when used in isolation, we advocate for a defense-in-depth approach that composes complementary controls across the full stack. We further outline key open challenges in achieving such control, including the dual-use nature of knowledge and compositional generalization.
GENERator: A Long-Context Generative Genomic Foundation Model
Q. Li
Wei Wu
Yong Zhang
Rui Chen
Mingyang Li
Kun Fu
Junyan Qi
Yongzhou Bao
Chao Wang
Yiheng Zhu
Zhiyun Zhang
Fuli Feng
Jieping Ye
Liu Yuwen
Hui Xiong
Zheng Wang
Zhang, Yuanyuan
Chen, Ruipu … (voir 2 de plus)
Wang, Chao
Tang, Jian
QMAP: A Benchmark for Standardized Evaluation of Antimicrobial Peptide MIC and Hemolytic Activity Regression
Anthony Lavertu
Pascal Germain
Antimicrobial peptides (AMPs) are promising alternatives to conventional antibiotics, but progress in computational AMP discovery has been d… (voir plus)ifficult to quantify due to inconsistent datasets and evaluation protocols. We introduce QMAP, a domain-specific benchmark for predicting AMP antimicrobial potency (MIC) and hemolytic toxicity (HC50) with homology-aware, predefined test sets. QMAP enforces strict sequence homology constraints between training and test data, ensuring that model performance reflects true generalization rather than overfitting. Applying QMAP, we reassess existing MIC models and establish baselines for MIC and HC50 regression. Results show limited progress over six years, poor performance for high-potency MIC regression, and low predictability for hemolytic activity, emphasizing the need for standardized evaluation and improved modeling approaches for highly potent peptides. We release a Python package facilitating practical adoption, and with a Rust-accelerated engine enabling efficient data manipulation, installable with pip install qmap-benchmark.
Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations
What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-in… (voir plus)tegrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors'views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and justifying proprietary model usage despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of guidelines and considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.
Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors
D. Biton
Louis Vaillancourt
Yves V. Brun
Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent
Shagun Gupta
Youssef Briki
Parameswaran Raman
Hao-Jun Michael Shi
To maximize hardware utilization, modern machine learning systems typically employ large constant or manually tuned batch size schedules, re… (voir plus)lying on heuristics that are brittle and costly to tune. Existing adaptive strategies based on gradient noise scale (GNS) offer a principled alternative. However, their assumption of SGD's Euclidean geometry creates a fundamental mismatch with popular optimizers based on generalized norms, such as signSGD / Signum (
AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
Jianhao Ruan
Zhihao Xu
Yiran Peng
Fashen Ren
Zhaoyang Yu
Xinbing Liang
Jinyu Xiang
Yongru Chen
Chenglin Wu
Yuyu Luo
Jiayi Zhang
Language agents have shown strong promise for task automation. Realizing this promise for increasingly complex, long-horizon tasks has drive… (voir plus)n the rise of a sub-agent-as-tools paradigm for multi-turn task solving. However, existing designs still lack a dynamic abstraction view of sub-agents, thereby hurting adaptability. We address this challenge with a unified, framework-agnostic agent abstraction that models any agent as a tuple Instruction, Context, Tools, Model. This tuple acts as a compositional recipe for capabilities, enabling the system to spawn specialized executors for each task on demand. Building on this abstraction, we introduce an agentic system AOrchestra, where the central orchestrator concretizes the tuple at each step: it curates task-relevant context, selects tools and models, and delegates execution via on-the-fly automatic agent creation. Such designs enable reducing human engineering efforts, and remain framework-agnostic with plug-and-play support for diverse agents as task executors. It also enables a controllable performance-cost trade-off, allowing the system to approach Pareto-efficient. Across three challenging benchmarks (GAIA, SWE-Bench, Terminal-Bench), AOrchestra achieves 16.28% relative improvement against the strongest baseline when paired with Gemini-3-Flash. The code is available at: https://github.com/FoundationAgents/AOrchestra
Happiness as a Measure of Fairness
Georg Pichler
Marco Romanelli
In this paper, we propose a novel fairness framework grounded in the concept of _happiness_, a measure of the utility each group gains from … (voir plus)decision outcomes. By capturing fairness through this intuitive lens, we not only offer a more human-centered approach, but also one that is mathematically rigorous: In order to compute the optimal, fair post-processing strategy, only a linear program needs to be solved. This makes our method both efficient and scalable with existing optimization tools. Furthermore, it unifies and extends several well-known fairness definitions, and our empirical results highlight its practical strengths across diverse scenarios.
An Indicator of Membership Inference Security in Post-Training Quantized Models
Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance l… (voir plus)evels comparable to those of the original models. In this work, we investigate the impact of quantization procedures on privacy in data-driven models, focusing on their vulnerability to membership inference attacks. Membership Inference Security (MIS) has recently been proposed to characterize the privacy of machine learning models against the most powerful (and possibly unknown) attacks. However, quantifying MIS appears to be computationally very difficult. In this paper, we propose a new MIS indicator for post-training quantization procedures of machine learning models that minimize an empirical loss. This new indicator is a byproduct of a theoretical asymptotic analysis of the MIS in this context. We also present a methodology for empirically estimating our MIS indicator. Using synthetic datasets and real-world data (in the context of drug discovery), we demonstrate the effectiveness of our approach in assessing and ranking the MIS of different quantizers.
KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity
Damien Lesens
Beheshteh T. Rakhshan
The Key–Value (KV) cache is central to the efficiency of transformer-based large language models (LLMs), storing previously computed vecto… (voir plus)rs to accelerate inference. Yet, as sequence length and batch size grow, the cache becomes a major memory bottleneck. Prior compression methods typically apply low-rank decomposition to keys alone or attempt to jointly embed queries and keys, but both approaches neglect that attention fundamentally depends on their inner products. In this work, we prove that such strategies are sub-optimal for approximating the attention matrix. We introduce KQ-SVD, a simple and computationally efficient method that directly performs an optimal low-rank decomposition of the attention matrix via a closed-form solution. By targeting the true source of redundancy, KQ-SVD preserves attention outputs with higher fidelity under compression. Extensive evaluations on LLaMA and Mistral models demonstrate that our approach consistently delivers superior projection quality.