Publications

SCOPE: Selective Cross-modal Orchestration of Visual Perception Experts
Chao Wang
Juan A. Rodriguez
Xiangru Jian
Vision-language models (VLMs) benefit from multiple vision encoders, but naively stacking them yields diminishing returns while multiplying … (voir plus)inference costs. We propose SCOPE, a Mixture-of-Encoders (MoEnc) framework that dynamically selects one specialized encoder per image-text pair via instance-level routing, unlike token-level routing in traditional MoE. SCOPE maintains a shared encoder and a pool of routed encoders. A lightweight router uses cross-attention between text prompts and shared visual features to select the optimal encoder from the routed encoders. To train this router, we introduce dual entropy regularization with auxiliary losses to balance dataset-level load distribution with instance-level routing confidence. Remarkably, SCOPE with one shared plus one routed encoder outperforms models using all four extra encoders simultaneously, while reducing compute by 24-49%. This demonstrates that intelligent encoder selection beats brute-force aggregation, challenging the prevailing paradigm in multi-encoder VLMs.
Spatial distribution of spinal cord fMRI activity with electrocutaneous stimulation
Merve Kaptan
Teresa Indriolo
Dario Pfyffer
Dario Pfyffer
Lindsay Lee
John K Ratliff
Serena S. Hu
Suzanne Tharin
Zachary A. Smith
GARY GLOVER
Sean C Mackey
Kenneth A. Weber
Christine SW Law
Sensory organization at the spinal segment level is commonly inferred from dermatomal maps that assume a fixed correspondence between cutane… (voir plus)ous regions and spinal segments. However, based on the complexities of spinal neuroanatomy and neurophysiology, the distribution of sensory signals within the cord may be broader and less segment-specific than dermatomal maps suggest, leaving the segment-level localization of sensory-evoked activity in humans uncertain. Spinal cord functional magnetic resonance imaging (fMRI) is currently the only technique capable of noninvasively mapping sensory activity with high spatial resolution in the human spinal cord. However, its application remains technically challenging and is limited by the uncertainty in segmental localization. In this study, we leveraged recent advancements in spinal cord fMRI, including spinal nerve rootlet-based spatial normalization, to investigate how sensory information is represented and distributed within the human spinal cord during electrocutaneous stimulation of the third digit of the right hand (i.e., C7 dermatome). Forty healthy adults were scanned with electrocutaneous stimulation at four individualized intensities across multiple runs to quantify (i) the rostrocaudal distribution of sensory-evoked activity, (ii) intensity-dependent changes in detectability and localization, and (iii) the effect of normalization strategy on segmental localization. Across participants, stimulation produced activation localized in the lower cervical cord (e.g., C6-C8), with the most consistent segmental localization near C7. Stronger stimulation increased detectability and produced more consistent segmental localization across participants. Importantly, normalization that incorporated nerve rootlet landmarks sharpened localization and improved sensitivity relative to conventional intervertebral disc-based alignment. This highlights the value of functionally relevant anatomical landmarks for group inference in the spinal cord. Responses were strongest in the initial run and attenuated with repetition, suggesting habituation or adaptation that can bias multi-run paradigms if unmodeled. Together, our results define practical acquisition and analysis conditions (e.g., stimulation strength, anatomical alignment strategy, and run structure) under which segment-level spinal sensory responses can be detected, thereby supporting more reliable studies of human spinal cord future basic and translational studies, including pain mechanisms, sensory function, and spinal injury.
Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards
Catherine Ji
Benjamin Eysenbach
Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent per… (voir plus)ceives and represents the world. To learn powerful representations, an agent should actively explore states that contribute to its knowledge of the environment. Temporal representations can capture the information necessary to solve a wide range of potential tasks while avoiding the computational cost associated with full state reconstruction. In this paper, we propose an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes. We demonstrate that such representations can enable the learning of complex exploratory x in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards. Unlike approaches that rely on explicit distance learning or episodic memory mechanisms (e.g., quasimetric-based methods), our method builds directly on temporal similarities, yielding a simpler yet effective strategy for exploration.
The Geometry of Spectral Gradient Descent: Layerwise Criteria for SignSGD vs SpecSGD
Optimization in deep learning has expanded beyond Euclidean methods to include entrywise sign updates (SignSGD) and spectral sign updates (S… (voir plus)pecGD/Muon). While both can be viewed as steepest descent under non-Euclidean geometries (
Titanium nanotube arrays promote the activity of anastomotic healing-related cells by increasing fibronectin adsorption and activating the RGD–integrin pathway
Pengyu Chen
Yijia Li
Yahui Hu
Weihua Fu
The smooth titanium staples of stapling devices cannot reduce the incidence of gastrointestinal anastomotic leakage due to their bioinert na… (voir plus)ture and lack of active wound-healing promotion capability. This study aims to investigate whether titanium nanotube arrays (TNTs) can enhance the activity of cells involved in gastrointestinal anastomotic healing and further explore the potential mechanisms. TNTs were fabricated on pure titanium sheets via anodic oxidation, and characterized using scanning electron microscopy, roughness analysis, contact angle measurement, and x-ray photoelectron spectroscopy. Cell adhesion, proliferation, spreading, collagen secretion, and integrin expression were evaluated using methods such as CCK-8, immunofluorescence, qPCR, enzyme-linked immunosorbent assay (ELISA), and Western blot. Fibronectin (FN) adsorption and Arg-Gly-Asp tripeptide sequence (RGD domain) exposure were detected via bicinchoninic acid assay, fluorescent staining, and ELISA. The role of the RGD-integrin pathway was further investigated by supplementing serum-reduced medium with exogenous FN and using RGD-specific antagonists. The results showed that TNTs increased the roughness, hydrophilicity, and surface free energy of titanium surfaces. Compared with smooth pure titanium, TNTs promoted the adhesion, proliferation, spreading, and integrin expression of gastric mucosal epithelial cells and fibroblasts, while enhancing the collagen secretion capacity of fibroblasts. Moreover, TNTs adsorbed more FN and exposed more RGD domains, thereby upregulating integrin α5β1 expression. The RGD antagonist could reverse these enhanced cellular responses, confirming the pivotal role of the FN–RGD–integrin pathway. The conclusion indicates that TNTs enhance the adhesion, proliferation, and functional activity of gastrointestinal anastomosis-related cells by promoting FN adsorption and activating the RGD–integrin pathway, which demonstrates that TNT-modified titanium materials hold significant potential for developing bioactive anastomotic devices and promoting tissue healing.
Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models
Zhilong Zhang
Haoxiang Ren
Yifei Sheng
Haonan Wang
Haoxin Lin
Zhichao Wu
Yang Yu
Vision-Language-Action (VLA) models show strong generalization for robotic control, but finetuning them with reinforcement learning (RL) is … (voir plus)constrained by the high cost and safety risks of real-world interaction. Training VLA models in interactive world models avoids these issues but introduces several challenges, including pixel-level world modeling, multi-view consistency, and compounding errors under sparse rewards. Building on recent advances across multimodal models and model-based RL, we propose **VLA-MBPO**, a practical world model-based RL framework to tackle these problems in VLA finetuning. Our approach is guided by three key design choices: (i) adapting *unified multimodal models (UMMs)* to VLA settings, leveraging rich multimodal priors to enable world modeling with limited data; (ii) introducing an *interleaved view decoding* mechanism to enforce consistency across views; and (iii) employing *chunk-level branched rollout* to limit rollout horizons and mitigate error compounding during policy optimization. Our theoretical analysis shows a reduction in value gap of VLA-MBPO, and experiments in both simulated and real-world tasks demonstrate that our method effectively improves policy performance and sample efficiency for VLA finetuning.
Chromatin landscape and enhancer-gene interaction differences between three cardiac cell types
Yan Zhu
Jean‐Christophe Grenier
Raphaël Poujol
Olivier Tastet
Caroline Lee
Svenja Koslowski
Marouane Benzaki
Talal Fawaz
Roger Foo
Chukwuemeka George Anene-Nzelu
Matthew Ackers-Johnson
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Usman Anwar
Julianna Piskorz
David D. Baek
David Africa
Jim Weatherall
Max Tegmark
Christian Schroeder de Witt
Mihaela van der Schaar
David Krueger
Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight … (voir plus)mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these approaches inapplicable. We propose an alternative, **decision-theoretic view of steganography**. Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred from the agents’ observable actions. To formalise this perspective, we introduce generalised
Is Depth Heterogeneity a Barrier to Model Merging?
Model merging offers a way to combine the capabilities of several networks at test time without retraining or additional finetuning, but mos… (voir plus)t merging methods assume identical architectures. Depth differences are commonly viewed as a major obstacle because they remove clear layer correspondences. We test this assumption by merging residual networks that differ only in depth, using a simple training-free pipeline based on identity expansion and permutation alignment. Across both same-task and multitask image classification experiments, heterogeneous merges closely match homogeneous ones. The results suggest that, for residual networks, depth mismatch is not the main barrier to effective model merging, and that the main difficulty in model merging comes from aligning independently trained weights in a homogeneous setting.
From Large-Scale Winds to Urban Decision Making: A Cross-Scale Framework for Wind-Aware UAV Navigation
Fuyuan Lyu
Di Zhou
Xue Liu
Xiongye Xiao
Anima Anandkumar
Liangzhu Leon Wang
Large-scale weather and climate models provide reliable wind information at regional scales, yet their outputs are typically too coarse for … (voir plus)direct UAV decision making in geometrically complex urban environments. This paper investigates how large-scale atmospheric information can be transformed into city-scale wind representations and utilized for downstream navigation decisions. We propose a cross-scale prediction and decision framework that takes background wind conditions from existing weather or climate models and combines them with detailed 3D urban geometry to predict time-averaged urban wind fields using a 3D neural operator. The predicted wind fields are then incorporated into a wind-aware UAV trajectory optimization problem to minimize energy consumption under kinematic feasibility and safety constraints. By comparing trajectories planned against a wind-agnostic baseline, we demonstrate significant efficiency gains enabled by AI-predicted wind, specifically 10.3% savings in tailwinds, 7.7% in headwinds, and 3.9% in crosswind conditions. These results indicate that learning decision-relevant urban wind representations offers a practical pathway for bridging large-scale atmospheric information and fine-scale urban decision making.
Machine learning–based prediction of Metabolic Syndrome risk in the Quebec population
Shayan Nejadshamsi
Stella S. Daskalopoulou
Samira Abbasgholizadeh Rahimi
Objective This study evaluates multiple machine learning approaches to predict metabolic syndrome (MetS) risk in the Quebec, Canada populati… (voir plus)on. We further perform explainability analysis to interpret model predictions and identify key features driving risk classification. Methods and analysis This study followed the Minimum Information about Clinical Artificial Intelligence Modeling (MI-CLAIM) guideline for reporting. We used cross-sectional data from the Canadian Community Health Survey (2015–2018) for the population living in the province of Quebec, which includes 42,279 participants. Partial sampling was used to obtain a balanced dataset for model development. We evaluated seven machine learning models for the defined classification task, including Logistic Regression, XGBoost, LightGBM, TabNet, NODE, 1D-CNN and Regularisation Cocktails. Performance was assessed using accuracy, precision, recall, F1-score, AUROC, and AUPRC, and interpretability was examined using SHAP to identify key predictors of MetS risk. Results After partial sampling, 7,866 participants (4,856 high-risk and 3,010 low-risk MetS cases) were included in the machine learning analysis. XGBoost and NODE showed the strongest performance. XGBoost achieved the highest accuracy (80.4%) and AUROC (84.1%), while NODE achieved the highest precision (80.1%) and AUPRC (86.0%). Explainability analysis identified age, perceived health, and sex as the most important features contributing to MetS risk predictions. Conclusion This study shows that machine learning can accurately predict MetS risk using self-reported health survey data from the Quebec population. Comparison of classical and deep learning approaches identified the optimal predictive model, and explainability analyses identified the most important features contributing to the risk predictions, which align with established clinical evidence. These results support a machine learning–driven initial screening framework for population-level early identification of high-risk individuals, enabling targeted interventions and efficient allocation of healthcare resources.
Objective Misalignment in LLM-based Multi Agent Social Deception Game
Large language model–based multi-agent systems have attracted increasing attention for their strong performance in collaborative tasks and… (voir plus) social simulations. However, these interactive settings also introduce vulnerabilities, as a single agent's hidden goals and misaligned behavior can propagate misleading or malicious information throughout the system. In this work, we study these risks in the context of social deception games. We focus on the Werewolf Game, which requires agents to reason, communicate, and collaborate under asymmetric and incomplete information. We modify the individual objectives of some agents to induce benevolent, individualistic, and malevolent strategies that can make agents depart from the objectives of their own team. We evaluate how objective divergence affects game outcomes, collaboration, and goal satisfaction. Misaligned agents often succeed in achieving their own objectives, with effects amplified by role-based power asymmetries. Qualitative analyses further show that agents remain coherent and adaptive, strategically adjusting their reasoning, communication, voting behavior, and influence on group dynamics. These results indicate that risks in LLM-based multi-agent systems extend beyond collaborative task settings and persist even in environments where competition is structurally expected.