Publications

h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network
Yanru Qu
Wenjuan Tan
Xiangzhe Kong
Xiangxin Zhou
Chaoran Cheng
Jiaxuan You
Ge Liu
Accurate molecular representations are critical for drug discovery, and a central challenge lies in capturing the chemical environment of mo… (see more)lecular fragments, as key interactions, such as H-bond and π stacking—occur only under specific local conditions. Most existing approaches represent molecules as atom-level graphs; however, individual atoms cannot express stereochemistry, lone pairs, conjugation, and other complex features. Fragment-based methods (e.g., principal subgraph or functional group libraries) fail to preserve essential information such as chirality, aromatic bond integrity, and ionic states. This work addresses these limitations from two aspects. (i) **OverlapBPE tokenization**. We propose a novel data-driven molecule tokenization method. Unlike existing approaches, our method allows overlapping fragments, reflecting the inherently fuzzy boundaries of small-molecule substructures and, together with enriched chemical information at the token level, thereby preserving a more complete chemical context. (ii) **h- MINT model**. We develop a hierarchical molecular interaction network capable of jointly modeling drug–target interactions at both atom and fragment levels. By supporting fragment overlaps, the model naturally accommodates the many-to- many atom–fragment mappings introduced by the OverlapBPE scheme. Extensive evaluation against state-of-the-art methods shows our method improves binding affinity prediction by 2-4% Pearson/Spearman correlation on PDBBind and LBA, enhances virtual screening by 1-3% in key metrics on DUD-E and LIT-PCBA, and achieves the best overall HTS performance on PubChem assays. Further analysis demonstrates that our method effectively captures interactive information while maintaining good generalization.
How AI Is Reshaping Pricing Litigation
Maxime C. Cohen
Impact of an LLM-based Review Assistant in Practice: A Mixed Open-/Closed-source Case Study
Doriane Olewicki
Leuson Da Silva
Oussama Ben Sghaier
Suhaib Mujahid
Arezou Amini
Benjamin Mah
Marco Castelluccio
Sarra Habchi
Bram Adams
In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior
In-context reinforcement learning (ICRL) promises fast adaptation to unseen environments without parameter updates, but current methods eith… (see more)er cannot improve beyond the training distribution or require near-optimal data, limiting practical adoption. We introduce SPICE, a Bayesian ICRL method that learns a prior over Q-values via deep ensemble and updates this prior at test-time using in-context information through Bayesian updates. To recover from poor priors resulting from training on sub-optimal data, our online inference follows an Upper-Confidence Bound rule that favours exploration and adaptation. We prove that SPICE achieves regret-optimal behaviour in both stochastic bandits and finite-horizon MDPs, even when pretrained only on suboptimal trajectories. We validate these findings empirically across bandit and control benchmarks. SPICE achieves near-optimal decisions on unseen tasks, substantially reduces regret compared to prior ICRL and meta-RL approaches while rapidly adapting to unseen tasks and remaining robust under distribution shift.
Inference-time Physics Alignment of Video Generative Models with Latent World Models
Jianhao Yuan
Felix Friedrich
Nicolas Beltran-Velez
Melissa Hall
Reyhane Askari-Hemmat
Xiaochuang Han
Adriana Romero-Soriano
State-of-the-art video generative models produce promising visual content yet often violate basic physics principles, limiting their utility… (see more). While some attribute this deficiency to insufficient physics understanding from pre-training, we find that the shortfall in physics plausibility also stems from suboptimal inference strategies. We therefore introduce WMReward and treat improving physics plausibility of video generation as an inference-time alignment problem. In particular, we leverage the strong physics prior of a latent world model (here, VJEPA-2) as a reward to search and steer multiple candidate denoising trajectories, enabling scaling test-time compute for better generation performance. Empirically, our approach substantially improves physics plausibility across image-conditioned, multiframe-conditioned, and text-conditioned generation settings, with validation from human preference study. Notably, on the challenging PhysicsIQ benchmark we achieve 62.00% final score, outperforming previous state of the art by 6.78%. Our work demonstrates the viability of using latent world models to improve physical plausibility of video generation, beyond this specific instantiation or parameterization.
Integrating Generative and Experimental Platforms for Biomolecular Design
Soojung Yang
Sidney Lisanza
Jacob Gershon
Lauren Hong
Pranam Chatterjee
Biomolecular design, through artificial engineering of proteins, ligands, nucleic acids, and cells, holds immense promise in addressing pres… (see more)sing medical, industrial, and environmental challenges. While generative machine learning has shown significant potential in this area, a disconnect exists with experimental biology: many ML research efforts prioritize static benchmark performance, potentially sidelining impactful biological applications. This workshop seeks to bridge this gap by bringing computationalists and experimentalists together, catalyzing a deeper interdisciplinary discourse. Together, we will explore the strengths and challenges of generative ML in biology, experimental integration of generative ML, and biological problems ready for ML. To attract high-quality and diverse research, we partnered with Nature Biotechnology for a special collection, and we created dedicated tracks for in-silico ML research and hybrid ML-experimental biology research. Our lineup features emerging leaders as speakers and renowned scientists as panelists, encapsulating a spectrum from high-throughput experimentation and computational biology to generative ML. To catalyze new collaborations, we will host a seed-grant competition for pairs of experimentalists and computationalists proposing fresh joint projects. To connect dry and wet lab practice, a wet-lab challenge sponsored by Adaptyv Bio will empirically evaluate protein design models. With a diverse organizing team and backed by industry sponsors, we dedicate the workshop to pushing the boundaries of ML's role in biology. This will be the third edition of this workshop following the previous versions of it we organized at ICLR 2024 and 2025.
Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning
Peng Lu
Qiuhao Zeng
Yusuke Iwasawa
Yutaka Matsuo
A. Chandar
Edison Marrese-Taylor
Irene Li
Ensuring that deep learning models are well-calibrated in terms of their predictive uncertainty is essential in maintaining their trustworth… (see more)iness and reliability, yet despite increasing advances in foundation model research, the relationship between such large language models (LLMs) and their calibration remains an open area of research. In this work, we look at a critical gap in the calibration of LLMs within multilingual settings, in an attempt to better understand how the data scarcity can potentially lead to different calibration effects and how commonly used techniques can apply in these settings. Our analysis on two multilingual benchmarks, over 29 and 42 languages respectively, reveals that even in low-resource languages, model confidence can increase significantly after instruction-tuning on high-resource language SFT datasets. However, improvements in accuracy are marginal or non-existent, resulting in mis-calibration, highlighting a critical shortcoming of standard SFT for multilingual languages. Furthermore, we observe that the use of label smoothing to be a reasonable method alleviate this concern, again without any need for low-resource SFT data, maintaining better calibration across all languages. Overall, this highlights the importance of multilingual considerations for both training and tuning LLMs in order to improve their reliability and fairness in downstream use.
Large language models for electronic health records in pediatric and surgical care: a systematic review.
Waseem Abu-Ashour
Elena Guadagno
Leveraging Diversity for Privileged Multi-Teacher Knowledge Distillation for Facial Expression Recognition
Muhammad Haseeb Aslam
Alessandro L. Koerich
Eric Granger
LogicXGNN: Grounded Logical Rules for Explaining Graph Neural Networks
Ziyu Zhao
Zhaoyue Wang
Haolin Ye
Yuhe Jiang
Existing rule-based explanations for Graph Neural Networks (GNNs) provide global interpretability but often optimize and assess fidelity in … (see more)an intermediate, uninterpretable concept space, overlooking the grounding quality of the final subgraph explanations for end users. This gap yields explanations that may appear faithful yet be unreliable in practice. To this end, we propose LogicXGNN, a post hoc framework that constructs logical rules over reliable predicates explicitly designed to capture the GNN's message-passing structure, thereby ensuring effective grounding. We further introduce data-grounded fidelity (
Do machine learning methods make better predictions than conventional ones in pharmacoepidemiology? A systematic review, meta-analysis, and network meta-analysis.
Ana Paula Bruno Pena-Gralle
Mireille E. Schnitzer
Sofia-Nada Boureguaa
Félix Morin
Caroline Sirois
Alice Dragomir
Lucie Blais
MANSION: Multi-floor lANguage-to-3D Scene generatIOn for loNg-horizon tasks
Lirong Che
Shuo Wen
Shan Huang
Chuang Wang
Yuzhe Yang
Xueqian Wang
Jian Su
Real-world robotic tasks are long-horizon and often span multiple floors, requiring complex spatial reasoning. Existing embodied benchmarks,… (see more) however, are largely confined to single-floor homes, failing to evaluate agents on realistic, building-scale tasks. We introduce MANSION, a language-driven framework for generating building-scale, multi-floor 3D environments for long-horizon tasks. Using this framework, we release MansionWorld, a large-scale dataset featuring over 1,000 diverse, non-residential buildings. These environments support cross-floor skills and long-horizon task generation on reusable building layouts. Experiments show that current methods degrade sharply on our multi-floor tasks, highlighting both the challenge and the value of this setting for advancing embodied AI.