The Mila AI Policy Fellowship translates deep AI expertise into rigorous, public-interest policy. Read the newest publication Bridging the Expertise Gap: Knowledge Transfer Mechanisms for AI Regulation by Moritz von Knebel
This program supports AI startups at any time of the year. Benefit from cutting-edge resources and tailored support to accelerate your technology's development.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
BlabberSeg: Semantic Perception for Reliable Open-Vocabulary UAV Safe Landing
Reliable robot autonomy requires semantic perception that remains both informative and fast enough for closed-loop safety decisions. We pres… (see more)ent BlabberSeg, an optimized CLIPSeg-based open-vocabulary segmentation pipeline for UAV emergency landing. The method targets semantic reliability under edge constraints by reusing prompt, positional, and image features and deploying floating-point 16 ONNX (TensorRT) inference. In a DOVESEI-based safe-landing workflow, BlabberSeg reaches 16.78Hz on Jetson Orin AGX (64GB), a 927.41% speed increase over the original CLIPSeg (1.81Hz), with limited degradation in segmentation agreement (2.1% relative area difference) and mIoU (9%). At the task level, safe-landing success is preserved (76/100, matching baseline) while mission time is substantially reduced. These results support semantic open-vocabulary perception as a practical component for reliable autonomous landing.
2026-05-26
SRRA @ IEEE International Conference on Robotics and Automation (poster)
Time series forecasting in real-world settings often depends not only on historical observations, but also on external context that must be … (see more)actively discovered from noisy, heterogeneous information sources. Yet existing context-aided forecasting benchmarks typically assume that the supporting context is already provided, leaving open whether agents can identify it on their own. Therefore, we introduce Dr-CiK, a benchmark for evaluating whether agents can retrieve forecasting-relevant supporting context from a document corpus, filter out distractors, distill the retrieved context into forecast-useful evidence, and generate forecasts supported by that evidence. Through context ablations and evaluations of state-of-the-art deep research and forecasting methods paired together, we show that high-quality context substantially improves forecasting performance in Dr-CiK. However, most existing DR agents recover only a small fraction of the ground-truth supporting evidence (usually <5%), are frequently misled by distractors (>80% distractor citations), and can cause forecasters to perform worse with retrieved context than without context. Our results motivate research on foresight-driven agents that search for the right context to predict the future.
Existing theory of momentum assumes that gradients arrive at every parameter at a roughly constant rate, an assumption violated in practice … (see more)by heavy-tailed data distributions and modern architectures. We theoretically analyze the dynamics of two tractable models of momentum under sparse updates: a least squares model with sparse inputs and a logistic regression model with a rare class. Both admit exact closed-form second-moment dynamics whose high-dimensional limits we characterize across three scaling exponents for sparsity, batch size, and momentum decay. The phase structure on both problems is governed by the ratio of two intrinsic timescales: a momentum retention timescale (how many active updates the buffer survives) and a learning timescale (how many active updates it takes to reduce the squared error). When learning is much slower than retention, the limit matches SGD; when learning is faster, the system is unstable; where the timescales coincide, we recover classical heavy-ball dynamics. The oscillatory dynamics occur at different momentum values for different token sparsity, creating a spectral conflict for global momentum across token frequencies.
In online convex optimization (OCO), a decision-maker is confronted with an unknown environment and seeks to play an optimal sequence of dec… (see more)isions on a short time-scale using only past information. Recent advances in second-order OCO methods have demonstrated tighter regret bounds and improved empirical performance over traditional first-order methods. However, this performance comes at a cost: a matrix inversion is now required, which scales with the cube of the size of the problem. In this work, we propose sketching to mitigate this limitation. Specifically, we present the online sketched Newton-Raphson method (OSNR) which preserves the tight regret bounds obtained with second-order methods while presenting a strict computational improvement in terms of complexity. We discuss three application scenarios of OSNR: online root finding, unconstrained OCO, and time-varying equality-constrained OCO, and present their respective regret and a constraint violation bound for the latter. In all three applications, OSNR achieves sublinear dynamic regret bounds. For the equality-constrained case, the extension OSNR with equality constraints OSNR-EC is shown to yield sublinear cumulative constraint violation. Finally, we illustrate the performance of OSNR and OSNR-EC on two numerical examples, viz., online position tracking and optimal power flow, and observe that OSNR and OSNR-EC exhibit high performance even at low sampling rates.
Cross-view spatial reasoning remains a weak spot for vision-language models (VLMs): they often reason in language and lose the fine-grained … (see more)geometry needed for the task. Thinking with images aims to address this by generating an intermediate thinking image, but recent work shows that models often ignore the visual evidence in these traces. We therefore ask how to make visual thinking matter, and what kind of visual thinking works best. We study these questions in unified multimodal models (UMMs), which natively support interleaved image-text generation. For the first question, we propose View Dropout (VDrop), a training-time intervention that hides parts of one input view from the answer span while keeping them visible to the thinking-image tokens. This encourages the model to use the thinking image when answering, instead of relying only on the input views. Once the thinking image is used for answer prediction, we study which type of visual thinking is most effective. We frame this as a learnability-informativeness tradeoff and compare three thinking-image variants: top-down, panoramic, and point-matching renderings. Trained on synthetic scenes and evaluated on five real-world out-of-domain benchmarks, panoramic visual thinking with VDrop is the only configuration that is both informative and learnable, and it achieves the best out-of-domain generalization.
Online reinforcement learning (RL) agents increasingly depend on knowledge acquired offline to achieve practical efficiency. Originally stud… (see more)ied in offline-to-online RL, this paradigm now spans foundation model post-training and embodied intelligence, with prior types expanding from offline datasets and pre-trained policies to increasingly diverse knowledge sources such as multimodal foundation models and generative world models. Offline priors have become central to how deep RL is developed and deployed. However, this reliance introduces a challenge that the prevailing benchmark-driven paradigm cannot resolve: because prior validity varies across deployments and shifts during training, no single approach to managing it is universally optimal, and benchmark rankings offer limited guidance for real-world deployments. Rather than pursuing universal solutions, we argue that the field should shift to diagnosis-driven tension management, in which deployment-specific evidence guides how the learner relates to its priors throughout training, enabling both flexible and adaptive deployment. We support this position with a framework characterizing how priors reshape online optimization through three functional roles, controlled experiments demonstrating help-or-hurt reversals, cross-domain evidence from foundation model post-training to embodied intelligence, and engagement with five substantive counterarguments.
2026-05-24
DEMO @ International Conference on Machine Learning (poster)
With the implementation of national strategies aimed at building a leading sporting nation and promoting nationwide fitness, physical fitnes… (see more)s assessment has gained increasing attention as a crucial metric for evaluating students' physical condition and motor abilities. Concurrently, advancements in computer vision have enabled body keypoint detection technology to gradually replace traditional manual measurement methods, demonstrating significant potential for application in automated assessment systems. Accurate recognition of keypoints serves as the fundamental support for intelligent physical fitness testing and smart sports. However, existing keypoint detection algorithms often suffer from drifting of extremity keypoints, such as those of the hands and feet keypoints, in physical fitness test scenarios, thereby compromising the accuracy of the assessment. To address this challenge, this paper proposes Channel Attention BlazePose(CA-BlazePose), a body keypoint detection algorithm based on a channel attention mechanism, specifically designed for count-based physical fitness test scenarios, namely sit-ups and pull-ups. To tackle the issue of keypoint drift in motion detection, CA-BlazePose aims to enhance keypoint detection accuracy. It employs a two-stage network architecture consisting of heatmap training and regression fine-tuning, incorporating a channel attention module. This module strengthens the feature extraction process for extremity keypoints such as hands and feet, thereby improving recognition accuracy during detection.Experimental results demonstrate that, compared to mainstream keypoint detection algorithms such as OpenPose and BlazePose, the proposed CA-BlazePose algorithm achieves improvements in the PCK on two representative motion datasets, Common Objects in Context(COCO) and Leeds Sports Pose Extended(LSPET). Specifically, it shows an approximate increase of 7% for hand and foot keypoints and 8% for overall keypoints. Furthermore, in real-time detection tests for sit-ups and pull-ups captured from various viewing angles, CA-BlazePose demonstrates superior performance in handling frames with missing or drifting keypoints compared to existing algorithms, exhibiting more stable recognition performance under identical detection conditions.
2026-05-24
Journal of Intelligent Computing and Networking (published)
Offline reinforcement learning (RL) has traditionally focused on learning policies for direct deployment under conservative objectives, wher… (see more)e uncertainty outside the offline dataset is treated pessimistically to ensure robustness. We argue that this formulation becomes incomplete when an offline-trained policy is subsequently updated through online interaction, as increasingly occurs in modern intelligent systems through test-time adaptation and online fine-tuning. This position paper argues that, in such settings, the objective of offline RL should extend beyond immediate deployment and instead prioritize learning *adaptive policy priors*: policies that preserve the capacity to improve during subsequent interaction through memory, exploration, and self-correction. We formalize this perspective as *adaptive offline reinforcement learning* (AORL), distinguish it from offline-to-online RL, and explain why adaptability becomes important under distributional shift, limited dataset coverage, and changing test-time conditions. We further discuss Bayesian offline RL as one principled direction for constructing adaptive policy priors by preserving epistemic uncertainty over plausible environments. Finally, we outline connections, open challenges, and research directions for treating offline RL as preparation for future experience rather than as a static deployment problem.
2026-05-24
DEMO @ International Conference on Machine Learning (poster)
Shampoo-based methods, such as KL-Shampoo and SOAP, have demonstrated strong performance in training neural networks and rely on QR decompos… (see more)ition. Because existing QR implementations require single-precision (FP32) arithmetic and remain computationally expensive, these methods become time- and memory-intensive when their preconditioning matrices are large. Moreover, using BFloat16 (BFP16) storage to reduce memory usage can degrade the performance of Shampoo-based methods. We propose a reparametrization of the preconditioner that supports BFP16 storage and forms a complete basis by combining updated basis vectors with unchanged ones. By updating only part of the basis through QR decomposition in a subspace, our approach reduces computational overhead while mitigating the performance degradation caused by BFP16 storage. Our approach applies broadly to Shampoo-based methods that employ QR decomposition, including KL-Shampoo, SOAP, and KL-SOAP. In particular, it improves the performance of SOAP and KL-SOAP under BFP16 storage, enabling KL-SOAP to match or exceed KL-Shampoo. Overall, our approach makes Shampoo-based methods more memory- and time-efficient.
Neuroscientific research has revealed that the brain encodes complex behaviors by leveraging structured, low-dimensional manifolds and dynam… (see more)ically fusing multiple sources of information through adaptive gating mechanisms. Inspired by these principles, we propose a novel reinforcement learning (RL) framework that encourages the disentanglement of dynamics-specific and reward-specific features, drawing direct parallels to how neural circuits separate and integrate information for efficient decision-making. Our approach leverages locally linear embeddings (LLEs) to capture the intrinsic, locally linear structure inherent in many environments—mirroring the local smoothness observed in neural population activity—while concurrently deriving reward-specific features through the standard RL objective. An attention mechanism, analogous to cortical gating, adaptively fuses these complementary representations on a per-state basis. Experimental results on benchmark tasks demonstrate that our method, grounded in neuroscientific principles, improves learning efficiency and overall performance compared to conventional RL approaches, highlighting the benefits of explicitly modeling local state structures and adaptive feature selection as observed in biological systems.
2026-05-24
Transactions on Machine Learning Research (accepted)
We present a functional form (that we refer to as a Unified Neural Scaling Law (UNSL)) that accurately models and extrapolates the scaling b… (see more)ehaviors of deep neural networks as multiple dimensions all vary simultaneously (i.e. how the evaluation metric of interest varies as one simultaneously varies the number of model parameters, training dataset size, number of training steps, number of inference steps, amount of compute, and various hyperparameters) for various architectures and for each of various tasks within a varied set of upstream and downstream tasks. This set includes large-scale vision, language, math, and reinforcement learning. When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set.
Model merging---the direct combination of parameters from independently fine-tuned networks---offers a way to compose task-specific capabili… (see more)ties without retraining or ensemble inference. Existing merge methods are often built from hand-crafted arithmetic or sparsification heuristics, leaving open whether general learned weight-space operators can be repurposed for merging directly. We study this question with NiNo, a pre-trained checkpoint-nowcasting meta-network originally designed to predict near-future training states from short checkpoint histories. We show that pre-trained NiNo can be reused as a data-free pairwise meta-merge operator for independently fine-tuned models. On an 8-task CLIP ViT-B/16 benchmark, NiNo is competitive with strong arithmetic baselines and consistently lands in the same functional region as weight averaging, Task Arithmetic, and TIES. Moreover, NiNo is best on HumanEval in a Qwen3 language extension among the compared merge methods, while extending meta-merge beyond pairs remains an open challenge. These results position learned checkpoint nowcasting as a practical starting point for data-free model merging and motivate future weight-space learners trained for merging explicitly.
2026-05-23
WSS @ International Conference on Machine Learning (poster)