Publications

GENERator: A Long-Context Generative Genomic Foundation Model

Q. Li

Wei Wu

Yong Zhang

Zhihao Zhan

Rui Chen

Mingyang Li

Kun Fu

Junyan Qi

Yongzhou Bao

Chao Wang

Yiheng Zhu

Zhiyun Zhang

Jian Tang

Fuli Feng

Jieping Ye

Liu Yuwen

Hui Xiong

Zheng Wang

Zhang, Yuanyuan

Chen, Ruipu … (voir 2 de plus)

Wang, Chao

Tang, Jian

2026-02-03

Research Square (accepté)

Knowing When to Answer: Adaptive Confidence Refinement for Reliable Audio-Visual Question Answering

Dinh Phu Tran

Jihoon Jeong

Saad Wazir

Seongah Kim

Thao Do

Cem Subakan

Daeyoung Kim

We present a formal problem formulation for \textit{Reliable} Audio-Visual Question Answering (…

2026-02-03

Open MIND (prépublication)

Privileged Information Distillation for Language Models

Emiliano Penaloza

Dheeraj Vattikonda

Nicolas Gontier

Alexandre Lacoste

Laurent Charlin

Massimo Caccia

Training-time privileged information (PI) can enable language models to succeed on tasks they would otherwise fail, making it a powerful too… (voir plus)l for reinforcement learning in hard, long-horizon settings. However, transferring capabilities learned with PI to policies that must act without it at inference time remains a fundamental challenge. We study this problem in the context of distilling frontier models for multi-turn agentic environments, which typically hide their internal reasoning and expose only action trajectories. This breaks standard distillation pipelines, since successful behavior is observable, but the reasoning process is not. For this, we introduce {\pi}-Distill, a joint teacher-student objective that trains a PI-conditioned teacher and an unconditioned student simultaneously using the same model. Additionally, we also introduce On-Policy Self-Distillation (OPSD), an alternative approach that trains using Reinforcement Learning (RL) with a reverse KL-penalty between the student and the PI-conditioned teacher. We show that both of these algorithms effectively distill frontier agents using action-only PI. Specifically, we find that {\pi}-Distill and, in some cases, OPSD, outperform industry standard practices (Supervised finetuning followed by RL) that assume access to full Chain-of-Thought supervision across multiple agentic benchmarks, models, and forms of PI. We complement our results with extensive analysis that characterizes the factors enabling effective learning with PI, focusing primarily on {\pi}-Distill and characterizing when OPSD is competitive.

2026-02-03

arXiv (prépublication)

QMAP: A Benchmark for Standardized Evaluation of Antimicrobial Peptide MIC and Hemolytic Activity Regression

Anthony Lavertu

Jacques Corbeil

Pascal Germain

Antimicrobial peptides (AMPs) are promising alternatives to conventional antibiotics, but progress in computational AMP discovery has been d… (voir plus)ifficult to quantify due to inconsistent datasets and evaluation protocols. We introduce QMAP, a domain-specific benchmark for predicting AMP antimicrobial potency (MIC) and hemolytic toxicity (HC50) with homology-aware, predefined test sets. QMAP enforces strict sequence homology constraints between training and test data, ensuring that model performance reflects true generalization rather than overfitting. Applying QMAP, we reassess existing MIC models and establish baselines for MIC and HC50 regression. Results show limited progress over six years, poor performance for high-potency MIC regression, and low predictability for hemolytic activity, emphasizing the need for standardized evaluation and improved modeling approaches for highly potent peptides. We release a Python package facilitating practical adoption, and with a Rust-accelerated engine enabling efficient data manipulation, installable with pip install qmap-benchmark.

2026-02-03

bioRxiv (prépublication)

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

Karla Felix Navarro

Eugene Syriani

Ian Arawjo

What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-in… (voir plus)tegrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors'views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and justifying proprietary model usage despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of guidelines and considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.

2026-02-03

ArXiv (prépublication)

Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors

Louis Vaillancourt

Yves V. Brun

Yoshua Bengio

Alex Hernández-García

2026-02-03

arXiv (prépublication)

Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent

Hiroki Naganuma

Shagun Gupta

Youssef Briki

Ioannis Mitliagkas

Irina Rish

Parameswaran Raman

Hao-Jun Michael Shi

To maximize hardware utilization, modern machine learning systems typically employ large constant or manually tuned batch size schedules, re… (voir plus)lying on heuristics that are brittle and costly to tune. Existing adaptive strategies based on gradient noise scale (GNS) offer a principled alternative. However, their assumption of SGD's Euclidean geometry creates a fundamental mismatch with popular optimizers based on generalized norms, such as signSGD / Signum (

2026-02-02

arXiv (prépublication)

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration

Jianhao Ruan

Zhihao Xu

Yiran Peng

Fashen Ren

Zhaoyang Yu

Xinbing Liang

Jinyu Xiang

Yongru Chen

Bang Liu

Chenglin Wu

Yuyu Luo

Jiayi Zhang

Language agents have shown strong promise for task automation. Realizing this promise for increasingly complex, long-horizon tasks has drive… (voir plus)n the rise of a sub-agent-as-tools paradigm for multi-turn task solving. However, existing designs still lack a dynamic abstraction view of sub-agents, thereby hurting adaptability. We address this challenge with a unified, framework-agnostic agent abstraction that models any agent as a tuple Instruction, Context, Tools, Model. This tuple acts as a compositional recipe for capabilities, enabling the system to spawn specialized executors for each task on demand. Building on this abstraction, we introduce an agentic system AOrchestra, where the central orchestrator concretizes the tuple at each step: it curates task-relevant context, selects tools and models, and delegates execution via on-the-fly automatic agent creation. Such designs enable reducing human engineering efforts, and remain framework-agnostic with plug-and-play support for diverse agents as task executors. It also enables a controllable performance-cost trade-off, allowing the system to approach Pareto-efficient. Across three challenging benchmarks (GAIA, SWE-Bench, Terminal-Bench), AOrchestra achieves 16.28% relative improvement against the strongest baseline when paired with Gemini-3-Flash. The code is available at: https://github.com/FoundationAgents/AOrchestra

2026-02-02

arXiv (prépublication)

Happiness as a Measure of Fairness

Georg Pichler

Marco Romanelli

Pablo Piantanida

In this paper, we propose a novel fairness framework grounded in the concept of _happiness_, a measure of the utility each group gains from … (voir plus)decision outcomes. By capturing fairness through this intuitive lens, we not only offer a more human-centered approach, but also one that is mathematically rigorous: In order to compute the optimal, fair post-processing strategy, only a linear program needs to be solved. This makes our method both efficient and scalable with existing optimization tools. Furthermore, it unifies and extends several well-known fairness definitions, and our empirical results highlight its practical strengths across diverse scenarios.

2026-02-02

Artificial Intelligence and Statistics (poster)

An Indicator of Membership Inference Security in Post-Training Quantized Models

Eric Aubinais

Philippe Formont

Pablo Piantanida

Elisabeth Gassiat

Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance l… (voir plus)evels comparable to those of the original models. In this work, we investigate the impact of quantization procedures on privacy in data-driven models, focusing on their vulnerability to membership inference attacks. Membership Inference Security (MIS) has recently been proposed to characterize the privacy of machine learning models against the most powerful (and possibly unknown) attacks. However, quantifying MIS appears to be computationally very difficult. In this paper, we propose a new MIS indicator for post-training quantization procedures of machine learning models that minimize an empirical loss. This new indicator is a byproduct of a theoretical asymptotic analysis of the MIS in this context. We also present a methodology for empirically estimating our MIS indicator. Using synthetic datasets and real-world data (in the context of drug discovery), we demonstrate the effectiveness of our approach in assessing and ranking the MIS of different quantizers.

2026-02-02

International Conference on Artificial Intelligence and Statistics (poster)

KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity

Damien Lesens

Beheshteh T. Rakhshan

Guillaume Rabusseau

The Key–Value (KV) cache is central to the efficiency of transformer-based large language models (LLMs), storing previously computed vecto… (voir plus)rs to accelerate inference. Yet, as sequence length and batch size grow, the cache becomes a major memory bottleneck. Prior compression methods typically apply low-rank decomposition to keys alone or attempt to jointly embed queries and keys, but both approaches neglect that attention fundamentally depends on their inner products. In this work, we prove that such strategies are sub-optimal for approximating the attention matrix. We introduce KQ-SVD, a simple and computationally efficient method that directly performs an optimal low-rank decomposition of the attention matrix via a closed-form solution. By targeting the true source of redundancy, KQ-SVD preserves attention outputs with higher fidelity under compression. Extensive evaluations on LLaMA and Mistral models demonstrate that our approach consistently delivers superior projection quality.

2026-02-02

Artificial Intelligence and Statistics (poster)