The Mila AI Policy Fellowship translates deep AI expertise into rigorous, public-interest policy. Read the newest publication Bridging the Expertise Gap: Knowledge Transfer Mechanisms for AI Regulation by Moritz von Knebel
This program supports AI startups at any time of the year. Benefit from cutting-edge resources and tailored support to accelerate your technology's development.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Learning Implicit Feasibility Constraints for Real-World Routing and Scheduling: Application to Log Transportation
Real-world vehicle routing and scheduling problems involve complex operational rules and feasibility constraints typically formulated as mix… (see more)ed-integer linear programs (MILP). However, optimization tools are built around a fixed set of hard-coded constraints, while in practice this set evolves as new rules or preferences emerge, seasonally or permanently. Updating it requires modeling and operations research skills that planners rarely have, so generated plans are routinely adjusted by hand based on practical knowledge. Building on recent work that uses machine learning to recover such hidden constraints, we propose a data-driven constraint-learning approach that trains three complementary predictors, a Graph Neural Network (GNN), a decision tree, and a linear regression, on historical execution data from a log-truck routing and scheduling problem (
Modern video diffusion models generate increasingly realistic and temporally coherent videos, motivating their use as candidate world simula… (see more)tors. Yet it remains unclear whether these models internally encode physical structure, or merely reproduce motion patterns seen during training. We study this question by probing video diffusion models along latent trajectories corresponding to real videos with known physical plausibility. To obtain such trajectories, we approximately invert the deterministic sampling process by integrating the learned velocity field backward from a clean video latent to noise, giving access to the model's intermediate states and attention maps. Using these recovered trajectories, we show that physical plausibility is linearly decodable from diffusion transformer states across IntPhys and InfLevel, reaching around 81.27% average accuracy and outperforming dedicated representation-learning baselines such as V-JEPA and VideoMAE. Surprisingly, this signal is absent from the VAE latent input and emerges inside the denoising transformer itself, despite the model not being trained with a self-supervised predictive objective. These findings suggest that physically meaningful representations can arise as a byproduct of generative denoising.
Fuzzy deduplication is key to constructing large language model training corpora. However, classic Locality-Sensitive Hashing pipelines scal… (see more)e poorly as corpora grow and are ill-suited to continuous ingestion. We present FOLD (Fuzzy Online Deduplication), an online fuzzy deduplication system that delivers high recall and throughput for evolving datasets. FOLD maintains an incrementally updated HNSW index over admitted documents, retrieving a small, high-quality candidate neighborhood for each incoming document instead of repeatedly rebuilding global buckets or rescanning the accumulated corpus. To our knowledge, FOLD is the first online fuzzy deduplication system to use HNSW. However, applying Jaccard similarity out of the box causes score crowding, making graph traversal unreliable within a small number of steps. FOLD addresses this with a bitmap representation that provides a more discriminative, Jaccard-aligned signal during HNSW search. Across four LLM-scale datasets (LM1B, C4, RealNews, and Common Crawl), FOLD stays fast and accurate as the corpus grows: at the largest evaluated scales, it maintains 93-97% recall and achieves up to 2.09x higher throughput than competing alternatives, whose best recall reaches only 76%.
Simulation environments are useful for both robot policy learning and planning verification and validation. Traditionally, the process of cr… (see more)eating a simulation was onerous. Creating a bespoke simulation environment for each individual environment that a robot would operate in was simply infeasible. In this work, we introduce PerceptTwin, a fully automatic pipeline that constructs interactive simulations directly from semantic scene representations produced by a robot's perception stack. PerceptTwin combines open-vocabulary object maps with 3D asset generation, affordance prediction, and commonsense condition checking. These interactive simulations can be used to validate and refine plans before they are executed on the robot hardware. Borrowing from the AI alignment literature, we also introduce an LLM judge that verifies plan correctness and alignment with human preferences. Experiments show that PerceptTwin feedback allows LLM planners to refine plans, enhance safety, and resist harmful black-box prompting attacks. In our suite of tasks, PerceptTwin improves plan success by an average of approximately 39% for GPT5, GPT5Mini, and GPT5Nano planners. Additionally, PerceptTwin also improves human plan verification by up to 18% on average for plans that fail due to unfilled skill preconditions. Our results demonstrate the potential of open-vocabulary scene simulation from robot perception as a foundation for safer, more reliable robot planning.
Reasoning models achieve strong performance on challenging tasks by generating explicit intermediate reasoning traces before producing a fin… (see more)al answer. Yet the internal structure of representation space when reasoning remains poorly understood: how do a model's hidden representations differ during thinking versus the embeddings of the input prompt, and can this structure be exploited to elicit stronger reasoning at inference time? We show that both input embeddings and thinking embeddings (mean-pooled last-layer hidden states over the prompt and reasoning trace, respectively) exhibit extremely high conicity, with all vectors clustering tightly around a single mean direction. Crucially, these mean input and thinking directions are non-collinear, with thinking embeddings occupying a geometrically distinct region of embedding space across many different models and benchmark tasks. This observation motivates casting the input-to-thinking transition as a rotation problem admitting a closed-form solution via orthogonal Procrustes analysis. We propose Rotate2Think, a training-free method that estimates this rotation from a small set of correctly solved examples and injects the resulting synthetic thinking vector between thinking delimiters at inference time, providing a geometric primer at the onset of the reasoning trace. Evaluated across multiple benchmarks and model families, Rotate2Think improves accuracy in 30 of 32 model-benchmark configurations across mathematics, science, and code tasks, and generalizes zero-shot to multimodal reasoning on MATH-Vision.
Classical reinforcement learning (RL) typically seeks a deterministic policy that maximizes the expected sum of a scalar reward. Yet, modern… (see more) applications such as language model fine-tuning or scientific discovery demand diversity. Existing remedies such as entropy regularization or diversity bonuses often require fragile trade-offs that sacrifice performance for stochasticity or rely on heuristic metrics that can misalign policy rankings. We argue that diversity is more naturally understood as the rational response to uncertainty in the reward. When the reward function is not perfectly known--as is the case with ambiguous preferences or imperfect reward models--committing to a single action can be sub-optimal. Building on this, we propose a fundamental reformulation of the RL objective by replacing the scalar reward with a distribution over reward functions, and applying a non-linear objective over sets of actions. The result is a framework in which calibrated behavioural diversity emerges naturally, remains controllable through the reward function distribution, and is obtained without sacrificing expected reward. Focusing on the contextual bandit setting, we derive a principled gradient estimator for this objective and prove that our formulation naturally generalizes both vanilla policy gradient and more recently developed action-set approaches. Our empirical results demonstrate that this framework offers a robust and theoretically grounded alternative for complex RL tasks where the traditional formulation of the problem fails to induce the desired breadth of agent behaviour.
To test whether the mean curvature of isophotes (MCI), a geometric image transformation, can be used to improve automatic detection on chest… (see more) CT of Usual Interstitial Pneumonia (UIP), a determining radiological pattern in the diagnosis of Interstitial Lung Diseases (ILD).
This retrospective study included chest CT scans from 234 patients (123 female,111 male; mean age: 61.6 years; age range: 18-90 years) obtained at two independent institutions between 2007 and 2024.
Three different classification models were trained on the original CT images and separately on MCI-transformed CT images: (1) a previously published deep learning model for classifying fibrotic lung disease on chest CT, (2) a classification pipeline based on the EfficientNet-V2 convolutional neural network architecture, and (3) a non-deep-learning model based on the functional principal component analysis (FPCA) of density functions of voxel intensity.
All models were trained on data from the first institution and evaluated on data from the second institution with the recall-macro, precision-macro and F1-macro scores. Performance difference between classifier pairs was tested with the Stuart-Maxwell marginal homogeneity test.
For a fixed model architecture and training algorithm, MCI-transformed images yield comparable or better classification performance than the original CT images. The best performance improvement achieved with MCI compared to CT was: recall-macro 0.83 vs 0.57, precision-macro 0.81 vs 0.50, F1-macro 0.80 vs 0.49, p=4.2e-5.
MCI may be a valuable addition to existing AI systems for screening for UIP on chest CT.
Machine learning methods for identifying usual interstitial pneumonia on chest CT perform better when the input CT images are transformed via the mean curvature of isophotes (MCI), a geometric transformation method known from classical computer vision.
Three machine learning models were trained on a dataset of 158 patients from one institution and tested on another dataset of 76 patients from an independent institution to discriminate for usual interstitial pneumonia (UIP) on chest CT in a 3-group classification task.
When keeping the network architecture and parameters fixed, changing the input image domain from the original CT to MCI-transformed images improved classification performance (Stuart-Maxwell test, p < 5e-3)
MCI may be a valuable addition to existing machine learning systems for screening for UIP on chest CT, whether based on deep learning or on simpler shallow classifiers.
Cough acoustic analysis using artificial intelligence for COVID-19 detection: A comparative study of patient cohorts from Lima, Peru and Montreal, Canada