Publications

Risk-seeking conservative policy iteration with agent-state based policies for Dec-POMDPs with guaranteed convergence

Matthieu Geist

Optimally solving decentralized decision-making problems modeled as Dec-POMDPs is known to be NEXP-complete. These optimal solutions are pol… (see more)icies based on the entire history of observations and actions of an agent. However, some applications may require more compact policies because of limited compute capabilities, which can be modeled by considering a limited number of memory states (or agent states). While such an agent-state based policy class may not contain the optimal solution, it is still of practical interest to find the best agent-state policy within the class. We focus on an iterated best response style algorithm which guarantees monotonic improvements and convergence to a local optimum in polynomial runtime in the Dec-POMDP model size. In order to obtain a better local optimum, we use a modified objective which incentivizes risk-seeking alongside a conservative policy iteration update. Our empirical results show that our approach performs as well as state-of-the-art approaches on several benchmark Dec-POMDPs, achieving near-optimal performance while having polynomial runtime despite the limited memory. We also show that using more agent states (a larger memory) leads to greater performance. Our approach provides a novel way of incorporating memory constraints on the agents in the Dec-POMDP problem.

2026-04-09

arXiv (preprint)

doi.org

arxiv.org

Multi-Modal Learning meets Genetic Programming: Analyzing Alignment in Latent Space Optimization

Benjamin Léger

Kazem Meidani

Christian Gagné

Symbolic regression (SR) aims to discover mathematical expressions from data, a task traditionally tackled using Genetic Programming (GP) th… (see more)rough combinatorial search over symbolic structures. Latent Space Optimization (LSO) methods use neural encoders to map symbolic expressions into continuous spaces, transforming the combinatorial search into continuous optimization. SNIP (Meidani et al., 2024), a contrastive pre-training model inspired by CLIP, advances LSO by introducing a multi-modal approach: aligning symbolic and numeric encoders in a shared latent space to learn the phenotype-genotype mapping, enabling optimization in the numeric space to implicitly guide symbolic search. However, this relies on fine-grained cross-modal alignment, whereas literature on similar models like CLIP reveals that such an alignment is typically coarse-grained. In this paper, we investigate whether SNIP delivers on its promise of effective bi-modal optimization for SR. Our experiments show that: (1) cross-modal alignment does not improve during optimization, even as fitness increases, and (2) the alignment learned by SNIP is too coarse to efficiently conduct principled search in the symbolic space. These findings reveal that while multi-modal LSO holds significant potential for SR, effective alignment-guided optimization remains unrealized in practice, highlighting fine-grained alignment as a critical direction for future work.

2026-04-08

arXiv (preprint)

doi.org

arxiv.org

Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning

Chenhao Li

Marco Hutter

World models promise a paradigm shift in robotics, where an agent learns the underlying physics of its environment once to enable efficient … (see more)planning and behavior learning. However, current world models are often hardware-locked specialists: a model trained on a Boston Dynamics Spot robot fails catastrophically on a Unitree Go1 due to the mismatch in kinematic and dynamic properties, as the model overfits to specific embodiment constraints rather than capturing the universal locomotion dynamics. Consequently, a slight change in actuator dynamics or limb length necessitates training a new model from scratch. In this work, we take a step towards a framework for training a generalizable Quadrupedal World Model (QWM) that disentangles environmental dynamics from robot morphology. We address the limitations of implicit system identification, where treating static physical properties (like mass or limb length) as latent variables to be inferred from motion history creates an adaptation lag that can compromise zero-shot safety and efficiency. Instead, we explicitly condition the generative dynamics on the robot's engineering specifications. By integrating a physical morphology encoder and a reward normalizer, we enable the model to serve as a neural simulator capable of generalizing across morphologies. This capability unlocks zero-shot control across a range of embodiments. We introduce, for the first time, a world model that enables zero-shot generalization to new morphologies for locomotion. While we carefully study the limitations of our method, QWM operates as a distribution-bounded interpolator within the quadrupedal morphology family rather than a universal physics engine, this work represents a significant step toward morphology-conditioned world models for legged locomotion.

2026-04-08

arXiv (preprint)

doi.org

arxiv.org

Epistemic Robust Offline Reinforcement Learning

Abhilash Reddy Chenreddy

Erick Delage

Offline reinforcement learning learns policies from fixed datasets without further environment interaction. A key challenge in this setting … (see more)is epistemic uncertainty, arising from limited or biased data coverage, particularly when the behavior policy systematically avoids certain actions. This can lead to inaccurate value estimates and unreliable generalization. Ensemble-based methods like SAC-N mitigate this by conservatively estimating Q-values using the ensemble minimum, but they require large ensembles and often conflate epistemic with aleatoric uncertainty. To address these limitations, we propose a unified and generalizable framework that replaces discrete ensembles with compact uncertainty sets over Q-values. %We further introduce an Epinet based model that directly shapes the uncertainty sets to optimize the cumulative reward under the robust Bellman objective without relying on ensembles. We also introduce a benchmark for evaluating offline RL algorithms under risk-sensitive behavior policies, and demonstrate that our method achieves improved robustness and generalization over ensemble-based baselines across both tabular and continuous state domains.

2026-04-07

arXiv (preprint)

doi.org

arxiv.org

LSST Strong Lensing Systems Dark Matter Sensitivity Analysis with Neural Ratio Estimators

Andreas Filipp

Yashar Hezaveh

Laurence Perreault-Levasseur

Daniel Gilman

LSST Dark Energy Science Collaboration

Strong gravitational lensing offers a unique probe of dark matter (DM) on sub-galactic scales, where the abundance and distribution of low-m… (see more)ass halos are highly sensitive to the underlying properties of DM particles. In this work, we forecast LSST's sensitivity to DM substructure in galaxy-galaxy strong lenses using simulated samples and neural ratio estimators (NREs). Our simulations include both subhalos within the main deflector and line-of-sight (LOS) halos, with halo masses down to

2026-04-07

arXiv (preprint)

doi.org

arxiv.org

Robust Mendelian Randomization Estimation using Weighted Quantile Regression

Julien St-Pierre

Archer Y. Yang

Mireille E. Schnitzer

Marc-André Legault

In Mendelian randomization (MR) studies, genetic variants are used as instrumental variables (IVs) to investigate causal relationships betwe… (see more)en exposures and outcomes based on observational data. However, numerous genetic studies have shown the pervasive pleiotropy of genetic variants, meaning that many, if not most, variants are associated with multiple traits, potentially violating the core assumptions of IV estimation. Uncorrelated pleiotropy occurs when genetic variants have a direct effect on the outcome that is not mediated by the exposure, while correlated pleiotropy occurs when genetic variants affect the exposure and outcome via shared heritable confounders. In this work, we propose a novel MR method, called MR-Quantile, based on weighted quantile regression (WQR) that is robust to both correlated and uncorrelated pleiotropy. We propose a procedure for selecting the optimal quantile of the ratio estimates through a likelihood-based formulation of WQR using the asymmetric Laplace distribution. Monte Carlo simulations demonstrate the empirical performance of the proposed method, especially in settings with many invalid IVs with weak pleiotropic effects. Finally, we apply our method to study the causal effect of resting heart rate on atrial fibrillation. Genetic variants associated with heart rate were identified in a genome-wide association study of 425,748 individuals from the VA Million Veteran Program, and used as instruments in a two-sample MR analysis with summary statistics from a genetic meta-analysis of 228,926 AF cases across eight studies.

2026-04-07

arXiv (preprint)

doi.org

arxiv.org

The Illusion of Stochasticity in LLMs

Xiangming Gu

Soham De

Michalis K. Titsias

Larisa Markeeva

Petar Veličković

Razvan Pascanu

In this work, we demonstrate that reliable stochastic sampling is a fundamental yet unfulfilled requirement for Large Language Models (LLMs)… (see more) operating as agents. Agentic systems are frequently required to sample from distributions, often inferred from observed data, a process which needs to be emulated by the LLM. This leads to a distinct failure point: while standard RL agents rely on external sampling mechanisms, LLMs fail to map their internal probability estimates to their stochastic outputs. Through rigorous empirical analysis across multiple model families, model sizes, prompting styles, and distributions, we demonstrate the extent of this failure. Crucially, we show that while powerful frontier models can convert provided random seeds to target distributions, their ability to sample directly from specific distributions is fundamentally flawed.

2026-04-07

arXiv (preprint)

doi.org

arxiv.org

Uncertainty Assessment in Deep Learning-based Plant Trait Retrievals from Hyperspectral data

Eya Cherif

Teja Kattenborn

Luke A. Brown

Michael Ewald

Katja Berger

Phuong D. Dao

Tobias B. Hank

Étienne Laliberté

Bing Lu

Hannes Feilhauer

Abstract. Large-scale mapping of plant biophysical and biochemical traits is essential for ecological and environmental applications. Given … (see more)their finer spectral resolution and unprecedented data availability, hyperspectral data, in concert with machine and particularly deep learning models, have emerged as a promising, non-destructive tool for accurately retrieving these traits. However, when deploying these methods on a large scale, reliably quantifying the associated uncertainty remains a critical challenge, especially when models encounter out-of-domain (OOD) data, i.e., samples that differ substantially from those of the training data, such as unseen geographical regions, species, biomes, data acquisition modalities, or scene components (e.g., clouds and water bodies). Traditional uncertainty quantification methods for deep learning models, including deep ensembles (deterministic and probabilistic) and Monte Carlo dropout, rely on the variance of predictions but often fail to capture uncertainty in OOD scenarios, leading to overly optimistic and possibly misleading uncertainty estimates. To address this limitation, we propose a distance-based uncertainty estimation method (Dis_UN) that quantifies prediction uncertainty by measuring the dissimilarity in the predictor space (spectral inputs) and embedding space (features learned by the deep model) between the training and test data. Dis_UN leverages residuals as a proxy for uncertainty and employs dissimilarity indices in data manifolds to estimate worst-case errors via 95-quantile regression. We evaluate Dis_UN using a pretrained deep learning model to predict multiple plant traits from hyperspectral images, analyzing its performance across OOD data, such as pixels containing spectral variations from urban surfaces, bare ground, water, clouds, or open surface waters. In this study, we target six leaf and canopy traits: leaf mass per area, chlorophylls, carotenoids, nitrogen content, equivalent water thickness, and leaf area index. Compared to scaled variance-based methods, Dis_UN provides (1) a superior estimation of uncertainty in OOD scenarios, achieving 36 % higher contrast (KS distances: 0.648 vs. 0.475) between non-vegetation pixels, particularly under mixed-pixel conditions at medium resolution (30 m); (2) uncertainty quantification without requiring normality or symmetry assumptions, accommodating asymmetric error patterns; (3) enhanced interpretability of uncertainty sources, as uncertainty is directly linked to sample dissimilarity from the training data; and (4) computational efficiency at inference (2.6–7.7× faster), requiring only a single forward pass compared to multiple passes for ensemble-based methods. Challenges remain for traits that are affected by spectral saturation. These findings highlight the advantages of distance-aware uncertainty quantification methods and underscore the necessity of diverse training datasets to minimize sampling biases and enhance model robustness. The proposed framework improves the reliability of uncertainty estimation in vegetation monitoring and offers a promising approach for broader applications.

2026-04-07

Biogeosciences (published)

doi.org

Evaluation of data-driven kinematic models for autonomous control of continuum robotic in-situ bioprinters

Swen A.T. Groen

David Brenken

Samuel Smocot

James Richard Forbes

Luc Mongeau

Audrey Sedal

Minimally invasive in-situ bioprinting involves the direct deposition of hydrogels within the body to reconstruct tissue defects. These biop… (see more)rinters use soft robotic printheads to extrude hydrogels through a hollow channel and nozzle. The accurate control of the nozzle tip position is critical for safety and shape fidelity. Due to a lack of sensing integration, existing control strategies are limited to feedforward models and are design-specific, thereby increasing development cost and complexity. This present study systematically compared three different data-driven modeling strategies for autonomous control of a cable-driven continuum in-situ bioprinter: 1) Polynomial regression, 2) Gaussian process regression, and 3) a neural network. Submillimeter accuracy was achieved for both the Gaussian process and the neural network in static measurements. The Polynomial regression model had a 1.67 mm accuracy. Dynamic trajectory tracking indicated that the performance of the neural network was comparable to that of the polynomial regression model and lower than that of the Gaussian process. Printing of different shaped constructs yielded minor visual deviations across all models from the target shapes. These results support the feasibility of real-time autonomous control for minimally invasive in-situ bioprinters and indicate the advantages of the different models during the design process of novel printing strategies.

2026-04-06

IEEE International Conference on Soft Robotics (published)

doi.org

Observability-Informed Optimal Sensor Placement for Soft Robots

Samuel Smocot

James R. Forbes

Audrey Sedal

This paper presents the application and experimental evaluation of a systematic method for optimal sensor placement in soft robots. Existing… (see more) methods either lack generalizability across different soft robot morphologies or do not account for system dynamics. The applied method uses convex optimization to find the optimal sensor configuration that maximizes an observability Gramian-based metric. The framework is experimentally evaluated using position and strain measurements on a soft continuum arm. Kalman filter state estimates using optimal sensor placements yield lower reconstruction error than a baseline across all sinusoidal input trials, with improvements on the order of millimeters. This case study shows that linear control theory tools can guide optimal sensor placement in soft robots, suggesting an interpretable approach to sensor placement that may extend to other morphologies.

2026-04-06

IEEE International Conference on Soft Robotics (published)

doi.org

Programmable Membrane Shape via Localized Stiffening of a Granular Suspension

Omar Khater

Karim Saliba

Audrey Sedal

Granular actuators are often implemented with as many vacuum pumps as jamming segments. For highly articulated motion, these segments increa… (see more)se mechanical design complexity. In this work, we propose cooperative use of pumps in a monolithic liquid-granular suspension to locally control jamming behavior. As an example of a granular soft robot, we change the shape of a buckled membrane by varying the local concentration of a particle-water suspension. We then develop and validate a numerical model based on localized stiffening to explain the underlying mechanisms behind membrane motion. We finally demonstrate that this scheme enables actuation programmed by pump sequences. This novel actuation may further enable localized jamming in arbitrary internal structures, ultimately leading to simple soft robots with programmable articulated motion.

2026-04-06

IEEE International Conference on Soft Robotics (published)

doi.org

The Illusion of Superposition? A Principled Analysis of Latent Thinking in Language Models

Michael Rizvi-Martel

Guillaume Rabusseau

Marius Mosbach

Latent reasoning via continuous chain-of-thoughts (Latent CoT) has emerged as a promising alternative to discrete CoT reasoning. Operating i… (see more)n continuous space increases expressivity and has been hypothesized to enable superposition: the ability to maintain multiple candidate solutions simultaneously within a single representation. Despite theoretical arguments, it remains unclear whether language models actually leverage superposition when reasoning using latent CoTs. We investigate this question across three regimes: a training-free regime that constructs latent thoughts as convex combinations of token embeddings, a fine-tuned regime where a base model is adapted to produce latent thoughts, and a from-scratch regime where a model is trained entirely with latent thoughts to solve a given task. Using Logit Lens and entity-level probing to analyze internal representations, we find that only models trained from scratch exhibit signs of using superposition. In the training-free and fine-tuned regimes, we find that the superposition either collapses or is not used at all, with models discovering shortcut solutions instead. We argue that this is due to two complementary phenomena: i) pretraining on natural language data biases models to commit to a token in the last layers ii) capacity has a huge effect on which solutions a model favors. Together, our results offer a unified explanation for when and why superposition arises in continuous chain-of-thought reasoning, and identify the conditions under which it collapses.

2026-04-06

arXiv (preprint)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications