Join us on the Venture Scientist Bootcamp, a full time, 4-month incubator at Mila, built specifically for deep tech founders with elite STEM backgrounds.
Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
SCOPE: Selective Cross-modal Orchestration of Visual Perception Experts
Vision-language models (VLMs) benefit from multiple vision encoders, but naively stacking them yields diminishing returns while multiplying … (see more)inference costs. We propose SCOPE, a Mixture-of-Encoders (MoEnc) framework that dynamically selects one specialized encoder per image-text pair via instance-level routing, unlike token-level routing in traditional MoE. SCOPE maintains a shared encoder and a pool of routed encoders. A lightweight router uses cross-attention between text prompts and shared visual features to select the optimal encoder from the routed encoders. To train this router, we introduce dual entropy regularization with auxiliary losses to balance dataset-level load distribution with instance-level routing confidence. Remarkably, SCOPE with one shared plus one routed encoder outperforms models using all four extra encoders simultaneously, while reducing compute by 24-49%. This demonstrates that intelligent encoder selection beats brute-force aggregation, challenging the prevailing paradigm in multi-encoder VLMs.
2026-03-01
MM_Intelligence @ International Conference on Learning Representations (poster)
Sensory organization at the spinal segment level is commonly inferred from dermatomal maps that assume a fixed correspondence between cutane… (see more)ous regions and spinal segments. However, based on the complexities of spinal neuroanatomy and neurophysiology, the distribution of sensory signals within the cord may be broader and less segment-specific than dermatomal maps suggest, leaving the segment-level localization of sensory-evoked activity in humans uncertain. Spinal cord functional magnetic resonance imaging (fMRI) is currently the only technique capable of noninvasively mapping sensory activity with high spatial resolution in the human spinal cord. However, its application remains technically challenging and is limited by the uncertainty in segmental localization. In this study, we leveraged recent advancements in spinal cord fMRI, including spinal nerve rootlet-based spatial normalization, to investigate how sensory information is represented and distributed within the human spinal cord during electrocutaneous stimulation of the third digit of the right hand (i.e., C7 dermatome). Forty healthy adults were scanned with electrocutaneous stimulation at four individualized intensities across multiple runs to quantify (i) the rostrocaudal distribution of sensory-evoked activity, (ii) intensity-dependent changes in detectability and localization, and (iii) the effect of normalization strategy on segmental localization. Across participants, stimulation produced activation localized in the lower cervical cord (e.g., C6-C8), with the most consistent segmental localization near C7. Stronger stimulation increased detectability and produced more consistent segmental localization across participants. Importantly, normalization that incorporated nerve rootlet landmarks sharpened localization and improved sensitivity relative to conventional intervertebral disc-based alignment. This highlights the value of functionally relevant anatomical landmarks for group inference in the spinal cord. Responses were strongest in the initial run and attenuated with repetition, suggesting habituation or adaptation that can bias multi-run paradigms if unmodeled. Together, our results define practical acquisition and analysis conditions (e.g., stimulation strength, anatomical alignment strategy, and run structure) under which segment-level spinal sensory responses can be detected, thereby supporting more reliable studies of human spinal cord future basic and translational studies, including pain mechanisms, sensory function, and spinal injury.
Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent per… (see more)ceives and represents the world. To learn powerful representations, an agent should actively explore states that contribute to its knowledge of the environment. Temporal representations can capture the information necessary to solve a wide range of potential tasks while avoiding the computational cost associated with full state reconstruction. In this paper, we propose an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes. We demonstrate that such representations can enable the learning of complex exploratory x in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards. Unlike approaches that rely on explicit distance learning or episodic memory mechanisms (e.g., quasimetric-based methods), our method builds directly on temporal similarities, yielding a simpler yet effective strategy for exploration.
Optimization in deep learning has expanded beyond Euclidean methods to include entrywise sign updates (SignSGD) and spectral sign updates (S… (see more)pecGD/Muon). While both can be viewed as steepest descent under non-Euclidean geometries (
2026-03-01
GRaM @ International Conference on Learning Representations (poster)
Titanium nanotube arrays promote the activity of anastomotic healing-related cells by increasing fibronectin adsorption and activating the RGD–integrin pathway
The smooth titanium staples of stapling devices cannot reduce the incidence of gastrointestinal anastomotic leakage due to their bioinert na… (see more)ture and lack of active wound-healing promotion capability. This study aims to investigate whether titanium nanotube arrays (TNTs) can enhance the activity of cells involved in gastrointestinal anastomotic healing and further explore the potential mechanisms. TNTs were fabricated on pure titanium sheets via anodic oxidation, and characterized using scanning electron microscopy, roughness analysis, contact angle measurement, and x-ray photoelectron spectroscopy. Cell adhesion, proliferation, spreading, collagen secretion, and integrin expression were evaluated using methods such as CCK-8, immunofluorescence, qPCR, enzyme-linked immunosorbent assay (ELISA), and Western blot. Fibronectin (FN) adsorption and Arg-Gly-Asp tripeptide sequence (RGD domain) exposure were detected via bicinchoninic acid assay, fluorescent staining, and ELISA. The role of the RGD-integrin pathway was further investigated by supplementing serum-reduced medium with exogenous FN and using RGD-specific antagonists. The results showed that TNTs increased the roughness, hydrophilicity, and surface free energy of titanium surfaces. Compared with smooth pure titanium, TNTs promoted the adhesion, proliferation, spreading, and integrin expression of gastric mucosal epithelial cells and fibroblasts, while enhancing the collagen secretion capacity of fibroblasts. Moreover, TNTs adsorbed more FN and exposed more RGD domains, thereby upregulating integrin α5β1 expression. The RGD antagonist could reverse these enhanced cellular responses, confirming the pivotal role of the FN–RGD–integrin pathway. The conclusion indicates that TNTs enhance the adhesion, proliferation, and functional activity of gastrointestinal anastomosis-related cells by promoting FN adsorption and activating the RGD–integrin pathway, which demonstrates that TNT-modified titanium materials hold significant potential for developing bioactive anastomotic devices and promoting tissue healing.
Vision-Language-Action (VLA) models show strong generalization for robotic control, but finetuning them with reinforcement learning (RL) is … (see more)constrained by the high cost and safety risks of real-world interaction. Training VLA models in interactive world models avoids these issues but introduces several challenges, including pixel-level world modeling, multi-view consistency, and compounding errors under sparse rewards. Building on recent advances across multimodal models and model-based RL, we propose **VLA-MBPO**, a practical world model-based RL framework to tackle these problems in VLA finetuning. Our approach is guided by three key design choices: (i) adapting *unified multimodal models (UMMs)* to VLA settings, leveraging rich multimodal priors to enable world modeling with limited data; (ii) introducing an *interleaved view decoding* mechanism to enforce consistency across views; and (iii) employing *chunk-level branched rollout* to limit rollout horizons and mitigate error compounding during policy optimization. Our theoretical analysis shows a reduction in value gap of VLA-MBPO, and experiments in both simulated and real-world tasks demonstrate that our method effectively improves policy performance and sample efficiency for VLA finetuning.
2026-03-01
World Models @ International Conference on Learning Representations (published)
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Usman Anwar
Julianna Piskorz
David D. Baek
David Africa
Jim Weatherall
Max Tegmark
Christian Schroeder de Witt
Mihaela van der Schaar
David Krueger
Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight … (see more)mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these approaches inapplicable. We propose an alternative, **decision-theoretic view of steganography**. Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred from the agents’ observable actions. To formalise this perspective, we introduce generalised
2026-02-28
AIWILD @ International Conference on Learning Representations (published)
Model merging offers a way to combine the capabilities of several networks at test time without retraining or additional finetuning, but mos… (see more)t merging methods assume identical architectures. Depth differences are commonly viewed as a major obstacle because they remove clear layer correspondences. We test this assumption by merging residual networks that differ only in depth, using a simple training-free pipeline based on identity expansion and permutation alignment. Across both same-task and multitask image classification experiments, heterogeneous merges closely match homogeneous ones. The results suggest that, for residual networks, depth mismatch is not the main barrier to effective model merging, and that the main difficulty in model merging comes from aligning independently trained weights in a homogeneous setting.
2026-02-28
TTU_Main_Track @ International Conference on Learning Representations (published)
Large-scale weather and climate models provide reliable wind information at regional scales, yet their outputs are typically too coarse for … (see more)direct UAV decision making in geometrically complex urban environments. This paper investigates how large-scale atmospheric information can be transformed into city-scale wind representations and utilized for downstream navigation decisions. We propose a cross-scale prediction and decision framework that takes background wind conditions from existing weather or climate models and combines them with detailed 3D urban geometry to predict time-averaged urban wind fields using a 3D neural operator. The predicted wind fields are then incorporated into a wind-aware UAV trajectory optimization problem to minimize energy consumption under kinematic feasibility and safety constraints. By comparing trajectories planned against a wind-agnostic baseline, we demonstrate significant efficiency gains enabled by AI-predicted wind, specifically 10.3% savings in tailwinds, 7.7% in headwinds, and 3.9% in crosswind conditions. These results indicate that learning decision-relevant urban wind representations offers a practical pathway for bridging large-scale atmospheric information and fine-scale urban decision making.
2026-02-28
AI_and_PDE @ International Conference on Learning Representations (poster)
Objective This study evaluates multiple machine learning approaches to predict metabolic syndrome (MetS) risk in the Quebec, Canada populati… (see more)on. We further perform explainability analysis to interpret model predictions and identify key features driving risk classification. Methods and analysis This study followed the Minimum Information about Clinical Artificial Intelligence Modeling (MI-CLAIM) guideline for reporting. We used cross-sectional data from the Canadian Community Health Survey (2015–2018) for the population living in the province of Quebec, which includes 42,279 participants. Partial sampling was used to obtain a balanced dataset for model development. We evaluated seven machine learning models for the defined classification task, including Logistic Regression, XGBoost, LightGBM, TabNet, NODE, 1D-CNN and Regularisation Cocktails. Performance was assessed using accuracy, precision, recall, F1-score, AUROC, and AUPRC, and interpretability was examined using SHAP to identify key predictors of MetS risk. Results After partial sampling, 7,866 participants (4,856 high-risk and 3,010 low-risk MetS cases) were included in the machine learning analysis. XGBoost and NODE showed the strongest performance. XGBoost achieved the highest accuracy (80.4%) and AUROC (84.1%), while NODE achieved the highest precision (80.1%) and AUPRC (86.0%). Explainability analysis identified age, perceived health, and sex as the most important features contributing to MetS risk predictions. Conclusion This study shows that machine learning can accurately predict MetS risk using self-reported health survey data from the Quebec population. Comparison of classical and deep learning approaches identified the optimal predictive model, and explainability analyses identified the most important features contributing to the risk predictions, which align with established clinical evidence. These results support a machine learning–driven initial screening framework for population-level early identification of high-risk individuals, enabling targeted interventions and efficient allocation of healthcare resources.
Large language model–based multi-agent systems have attracted increasing attention for their strong performance in collaborative tasks and… (see more) social simulations. However, these interactive settings also introduce vulnerabilities, as a single agent's hidden goals and misaligned behavior can propagate misleading or malicious information throughout the system. In this work, we study these risks in the context of social deception games. We focus on the Werewolf Game, which requires agents to reason, communicate, and collaborate under asymmetric and incomplete information. We modify the individual objectives of some agents to induce benevolent, individualistic, and malevolent strategies that can make agents depart from the objectives of their own team. We evaluate how objective divergence affects game outcomes, collaboration, and goal satisfaction. Misaligned agents often succeed in achieving their own objectives, with effects amplified by role-based power asymmetries. Qualitative analyses further show that agents remain coherent and adaptive, strategically adjusting their reasoning, communication, voting behavior, and influence on group dynamics. These results indicate that risks in LLM-based multi-agent systems extend beyond collaborative task settings and persist even in environments where competition is structurally expected.
2026-02-28
AIWILD @ International Conference on Learning Representations (published)