Publications

Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models
Xiangming Gu
Soham De
Larisa Markeeva
Large Reasoning Models (LRMs) have shown remarkable performance on challenging questions, such as math and coding. However, to obtain a high… (see more) quality solution, one may need to sample more than once. In principal, there are two sampling strategies that can be composed to form more complex processes: sequential sampling and parallel sampling. In this paper, we first compare these two approaches with rigor, and observe, aligned with previous works, that parallel sampling seems to outperform sequential sampling even though the latter should have more representation power. To understand the underline reasons, we make three hypothesis on the reason behind this behavior: (i) parallel sampling outperforms due to the aggregator operator; (ii) sequential sampling is harmed by needing to use longer contexts; (iii) sequential sampling leads to less exploration due to conditioning on previous answers. The empirical evidence on various model families and sizes (Qwen3, DeepSeek-R1 distilled models, Gemini 2.5) and question domains (math and coding) suggests that the aggregation and context length do not seem to be the main culprit behind the performance gap. In contrast, the lack of exploration seems to play a considerably larger role, and we argue that this is one main cause for the performance gap.
Unraveling microplastic retention distribution in porous media: A unified framework coupling flow conditions and particle properties
Haiyu Yuan
Guangqiu Jin
Qihao Jiang
Shuo Wang
Hao Lin
Zhongtian Zhang
Xiangfei Qi
YoNER: A New Yorùbá Multi-domain Named Entity Recognition Dataset
Peace Busola Falola
Jesujoba O. Alabi
Solomon O. Akinola
Folashade T. Ogunajo
Emmanuel Oluwadunsin Alabi
Named Entity Recognition (NER) is a foundational NLP task, yet research in Yor\`ub\'a has been constrained by limited and domain-specific re… (see more)sources. Existing resources, such as MasakhaNER (a manually annotated news-domain corpus) and WikiAnn (automatically created from Wikipedia), are valuable but restricted in domain coverage. To address this gap, we present YoNER, a new multidomain Yor\`ub\'a NER dataset that extends entity coverage beyond news and Wikipedia. The dataset comprises about 5,000 sentences and 100,000 tokens collected from five domains including Bible, Blogs, Movies, Radio broadcast and Wikipedia, and annotated with three entity types: Person (PER), Organization (ORG) and Location (LOC), following CoNLL-style guidelines. Annotation was conducted manually by three native Yor\`ub\'a speakers, with an inter-annotator agreement of over 0.70, ensuring high quality and consistency. We benchmark several transformer encoder models using cross-domain experiments with MasakhaNER 2.0, and we also assess the effect of few-shot in-domain data using YoNER and cross-lingual setups with English datasets. Our results show that African-centric models outperform general multilingual models for Yor\`ub\'a, but cross-domain performance drops substantially, particularly for blogs and movie domains. Furthermore, we observed that closely related formal domains, such as news and Wikipedia, transfer more effectively. In addition, we introduce a new Yor\`ub\'a-specific language model (OyoBERT) that outperforms multilingual models in in-domain evaluation. We publicly release the YoNER dataset and pretrained OyoBERT models to support future research on Yor\`ub\'a natural language processing.
Discovering Failure Modes in Vision-Language Models using RL
Vision-language Models (VLMs), despite achieving strong performance on multimodal benchmarks, often misinterpret straightforward visual conc… (see more)epts that humans identify effortlessly, such as counting, spatial reasoning, and viewpoint understanding. Previous studies manually identified these weaknesses and found that they often stem from deficits in specific skills. However, such manual efforts are costly, unscalable, and subject to human bias, which often overlooks subtle details in favor of salient objects, resulting in an incomplete understanding of a model's vulnerabilities. To address these limitations, we propose a Reinforcement Learning (RL)-based framework to automatically discover the failure modes or blind spots of any candidate VLM on a given data distribution without human intervention. Our framework trains a questioner agent that adaptively generates queries based on the candidate VLM's responses to elicit incorrect answers. Our approach increases question complexity by focusing on fine-grained visual details and distinct skill compositions as training progresses, consequently identifying 36 novel failure modes in which VLMs struggle. We demonstrate the broad applicability of our framework by showcasing its generalizability across various model combinations.
From Use to Oversight: How Mental Models Influence User Behavior and Output in AI Writing Assistants
AI-based writing assistants are ubiquitous, yet little is known about how users' mental models shape their use. We examine two types of ment… (see more)al models -- functional or related to what the system does, and structural or related to how the system works -- and how they affect control behavior -- how users request, accept, or edit AI suggestions as they write -- and writing outcomes. We primed participants (
General Multimodal Protein Design Enables DNA-Encoding of Chemistry
Théophile Lambert
Daniel Roth
Yueming Long
Zi-Qi Li
Xi Zhang
Miruna Cretu
Francesca-Zhoufan Li
Tanvi Ganapathy
Emily Jin
Avishek Joey Bose
Jason Yang
Kirill Neklyudov
Frances H. Arnold
Cheng-Hao Liu
Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encod… (see more)e. Deep generative models can design new proteins that bind ligands, but none have created enzymes without pre-specifying catalytic residues. We introduce DISCO (DIffusion for Sequence-structure CO-design), a multimodal model that co-designs protein sequence and 3D structure around arbitrary biomolecules, as well as inference-time scaling methods that optimize objectives across both modalities. Conditioned solely on reactive intermediates, DISCO designs diverse heme enzymes with novel active-site geometries. These enzymes catalyze new-to-nature carbene-transfer reactions, including alkene cyclopropanation, spirocyclopropanation, B-H, and C(sp
An international mega-analysis of psychedelic drug effects on brain circuit function
Manesh Girn
Manoj K. Doss
Leor Roseman
Katrin H. Preller
Fernanda Palhano-Fontes
Lorenzo Pasquini
Frederick S. Barrett
Pablo Mallaroni
Natasha L. Mason
Christopher Timmermann
Drummond E. McCulloch
Patrick M. Fisher
Brian S. Winston
Flora Moujaes
Felix Muller
Matthias E. Liechti
Franz X. Vollenweider
Johannes G. Ramaekers
Kim Kuypers
Draulio B. Araujo … (see 7 more)
Olaf Sporns
Joshua Siegel
Nico Dosenbach
David J. Nutt
Robin L. Carhart-Harris
Emmanuel A. Stamatakis
Psychedelic drugs are re-emerging as promising scientific and clinical tools. However, despite a rapidly expanding literature on their thera… (see more)peutic value, the neural mechanisms underlying psychedelic effects remain unclear. Resting-state functional magnetic resonance imaging studies of acute psychedelic effects, conducted independently by several research groups, have so far yielded fragmented and sometimes inconsistent findings. Here, to help facilitate greater convergence, we conducted a 'mega-analysis' integrating 11 independent resting-state functional magnetic resonance imaging datasets across five psychedelic drugs (psilocybin, lysergic acid diethylamide, mescaline, N,N-dimethyltryptamine and ayahuasca) from research groups spanning three continents and five countries. By applying a uniform preprocessing pipeline and a Bayesian hierarchical modeling framework, we discovered several common features in the induced alterations to brain function across drugs and sites. Most prominently, we identified a core signature of increased functional connectivity between transmodal (default, frontoparietal and limbic) and unimodal networks (visual and somatomotor), with subnetwork specificity. Furthermore, key subcortical regions (thalamus, caudate and putamen) and the cerebellum exhibited altered coupling with sensorimotor networks. In contrast to several single-site reports, Bayesian modeling revealed weak-to-moderate and selective reductions in within-network functional connectivity, with substantial variability across drugs and networks. Together, these findings extend past work by demonstrating that psychedelics reconfigure large-scale cortical organization while selectively engaging subcortical circuitry. This study provides the most comprehensive synthesis of psychedelic brain action to date, helping resolve inconsistencies and offering a probabilistic map of how psychedelics alter large-scale brain organization. We hereby provide a cornerstone to benchmark and shepherd future psychedelic neuroimaging research.
A Survey on Over-smoothing and Over-squashing: Unified Propagation Perspectives on Graph Neural Networks and Transformers
Álvaro Arroyo
Federico Barbero
Hugh Blayney
Michael Bronstein
Xiaowen Dong
Pietro Lio
Pierre Vandergheynst
Decoder-Transformers have achieved remarkable success and have laid the groundwork for the development of Large Language Models (LLMs). At t… (see more)he core of these models is the self-attention matrix, which allows different tokens to interact with each other. This process is remarkably similar to the message-passing mechanism used in Graph Neural Networks (GNNs), and as such decoder-Transformers suffer many of the optimization difficulties studied extensively in the GNN literature. In this paper, we present a unified graph perspective that bridges the theoretical understanding of decoder-Transformers and GNNs. We systematically examine how well-known phenomena in GNNs, such as over-smoothing and over-squashing, directly manifest as analogous issues like rank collapse and representational collapse in deep Transformer architectures. By interpreting Transformers' self-attention as a learned adjacency operator, we reveal shared underlying principles governing signal propagation and demonstrate how insights from one field can illuminate challenges and solutions in the other. We analyze the role of architectural components like residual connections, normalization, and causal masking in these issues. We aim to provide a framework for understanding how information flows through deep learning models that perform sequence mixing through an adjacency operator, and to highlight areas for cross-pollination of research, as well as to provide a comprehensive reference for researchers interested in the underpinnings of these architectures.
Understanding When Poisson Log-Normal Models Outperform Penalized Poisson Regression for Microbiome Count Data
Daniel Agyapong
Julien Chiquet
Jane C. Marks
Multivariate count models are often justified by their ability to capture latent dependence, but researchers receive little guidance on when… (see more) this added structure improves on simpler penalized marginal Poisson regression. We study this question using real microbiome data under a unified held-out evaluation framework. For count prediction, we compare PLN and GLMNet(Poisson) on 20 datasets spanning 32 to 18,270 samples and 24 to 257 taxa, using held-out Poisson deviance under leave-one-taxon-out prediction with 3-fold sample cross-validation rather than synthetic or in-sample criteria. For network inference, we compare PLNNetwork and GLMNet(Poisson) neighborhood selection on five publicly available datasets with experimentally validated microbial interaction truth. PLN outperforms GLMNet(Poisson) on most count-prediction datasets, with gains up to 38 percent. The primary predictor of the winner is the sample-to-taxon ratio, with mean absolute correlation as the strongest secondary signal and overdispersion as an additional predictor. PLNNetwork performs best on broad undirected interaction benchmarks, whereas GLMNet(Poisson) is better aligned with local or directional effects. Taken together, these results provide guidance for choosing between latent multivariate count models and penalized Poisson regression in biological count prediction and interaction recovery.
Co-Evolution of Policy and Internal Reward for Language Agents
Xinyu Wang
Hanwei Wu
Jingwei Song
Jiayi Zhang
Fanqi Kong
Tung Sum Thomas Kwok
Xiao-Wen Chang
Yuyu Luo
Chenglin Wu
Large language model (LLM) agents learn by interacting with environments, but long-horizon training remains fundamentally bottlenecked by sp… (see more)arse and delayed rewards. Existing methods typically address this challenge through post-hoc credit assignment or external reward models, which provide limited guidance at inference time and often separate reward improvement from policy improvement. We propose Self-Guide, a self-generated internal reward for language agents that supports both inference-time guidance and training-time supervision. Specifically, the agent uses Self-Guide as a short self-guidance signal to steer the next action during inference, and converts the same signal into step-level internal reward for denser policy optimization during training. This creates a co-evolving loop: better policy produces better guidance, and better guidance further improves policy as internal reward. Across three agent benchmarks, inference-time self-guidance already yields clear gains, while jointly evolving policy and internal reward with GRPO brings further improvements (8\%) over baselines trained solely with environment reward. Overall, our results suggest that language agents can improve not only by collecting more experience, but also by learning to generate and refine their own internal reward during acting and learning.
An Exact Framework for Solving the Space-Time Dependent TSP
Isaac Rudich
Manuel López-Ibáñez
Michael Romer
Louis-Martin Rousseau
Many real-world scenarios involve solving bilevel optimization problems in which there is an outer discrete optimization problem and an inne… (see more)r problem involving expensive or black box computation. This arises in space-time–dependent variants of the traveling salesman problem, such as when planning space missions that visit multiple astronomical objects. Planning these missions presents significant challenges due to the constant relative motion of the objects involved. There is an outer combinatorial problem of finding the optimal order to visit the objects and an inner optimization problem that requires finding the optimal departure time and trajectory to travel between each pair of objects. The constant motion of the objects complicates the inner problem, making it computationally expensive. This paper introduces a novel framework utilizing decision diagrams (DDs) and a DD-based branch-and-bound technique, peel-and-bound, to achieve exact solutions for such bilevel optimization problems, assuming sufficient inner problem optimizer quality. The framework leverages problem-specific knowledge to expedite search processes and minimize the number of expensive evaluations required. As a case study, we apply this framework to the asteroid routing problem, a benchmark problem in global trajectory optimization. Experimental results demonstrate the framework’s scalability and ability to generate robust heuristic solutions for tested instances. Many of these solutions are exact, contingent on the assumed quality of the inner problem’s optimizer. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms–Discrete. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2024.0866 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2024.0866 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .
Lipidomics Identifies HFpEF Phenogroups and a High-Risk Metabolic Signature - The BElgian and CAnadian MEtabolomics in HFpEF (BECAME-HF) project.
Nassiba Menghoum
Anik Forest
Pamela Mehanna
Olivier Tastet
Julie Legault
Isabelle Robillard Frayne
Sibille Lejeune
David Vancraeynest
Clotilde Roy
Galadriel Briere
Gabrielle Boucher
L Bertrand
Sandrine Horman
David Rhainds
J.‐C. Tardif
Christophe Beauloye
Anne-Catherine Pouleur
Christine Des Rosiers
Rationale: Heart failure with preserved ejection fraction (HFpEF) is a heterogeneous syndrome with substantial unmet diagnostic and therapeu… (see more)tic needs. Circulating lipid metabolism is increasingly implicated in HFpEF pathophysiology but has not been systematically leveraged for molecular stratification. Objective: To determine whether plasma lipidomics can identify molecular phenogroups of HFpEF associated with distinct clinical characteristics and outcomes. Methods and Results: Untargeted plasma lipidomics was performed in non-HF subjects and HFpEF patients from a primary Belgian cohort and an independent Canadian cohort (n=177 in each cohort). In the Belgian cohort, 235 unique lipids spanning 19 subclasses were annotated, including 96 significantly associated with HFpEF (q<0.02). Unsupervised analyses revealed marked lipidomic heterogeneity, with a distinct HFpEF subgroup separable from non-HF subjects. Hierarchical clustering identified three phenogroups with divergent lipid profiles and clinical features. One phenogroup exhibited severe atrial dysfunction, congestion-related biomarkers, elevated indices of cardiac and liver fibrosis, and markedly reduced survival, a second was characterized by prominent metabolic syndrome features, and a third by preserved renal function. Cross-cohort comparison using a supervised classifier trained on 158 shared lipids confirmed analogous lower-risk phenogroups in the Canadian cohort, while the high-risk phenotype was underrepresented. A signature of 10 lipids across six subclasses, including long-chain acylcarnitines, ether phosphatidylcholines, and oxidized sphingomyelins, discriminated the high-risk group and correlated with markers of disease severity. Conclusion: Our findings demonstrate that HFpEF comprises metabolically distinct patient subgroups across cohorts, revealing specific lipidomic dysfunctions that deepen our understanding of HFpEF heterogeneity and underlying pathophysiology.