The Mila AI Policy Fellowship translates deep AI expertise into rigorous, public-interest policy. Read the newest publication Bridging the Expertise Gap: Knowledge Transfer Mechanisms for AI Regulation by Moritz von Knebel
This program supports AI startups at any time of the year. Benefit from cutting-edge resources and tailored support to accelerate your technology's development.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Key Issues and Future Directions in the Construction and Control of Geocentric Orbit Constellations for Gravitational Wave Detection
2573
Background:
Immune checkpoint inhibitor (ICI) related hepatitis is a clinically significant immune-related adverse event (irAE) a… (see more)nd a common cause of treatment interruption. It occurs in roughly 5 to 10 percent of patients receiving anti PD-(L)1 monotherapy and in up to one third of those treated with combination ICI therapy. Despite increasing clinical recognition, the molecular mechanisms and predictive factors underlying ICI hepatitis remain poorly defined. The Montreal Immune-Related Adverse Events (MIRAE)-led hepatitis project aims to characterize the immune cell populations and underlying transcriptional programs associated with ICI-hepatitis pathogenesis.
Methods:
This translational study is conducted within the MIRAE biobank, a prospective multicenter cohort of ICI-treated patients with and without irAEs. The hepatitis cohort includes patients with longitudinal plasma samples collected at baseline, on treatment, and at irAE onset. Ongoing immune profiling efforts include plasma-based cytokine and chemokine analysis, high-throughput plasma proteomics, and single cell RNA sequencing of PBMCs. Preliminary analysis focused on plasma proteomics. Five patients with high-grade ICI-hepatitis and five ICI-treated controls without irAEs were selected and matched by age, sex, and primary tumor. Plasma samples were analyzed using the SomaScan 11K assay to identify differentially expressed proteins and enriched immune pathways.
Results:
ICI-related hepatitis was clinically severe, requiring systemic corticosteroids in all cases and additional immunosuppressive therapies in most patients. ICI-hepatitis cases showed significantly higher plasma levels of liver injury markers, including ALT and AST, compared with matched controls. Widespread alterations were observed in the circulating proteome, with strong upregulation of liver-enriched proteins and inflammatory mediators. Gene set enrichment analyses revealed enrichment of liver-associated pathways including xenobiotic and bile acid metabolism, as well as IL-12 signaling, interferon-α and γ, neutrophil-associated pathways, and liver-resident macrophage signatures. Pathway analysis of single cell data revealed enhanced cytotoxic activity of CD8 T cells during ICI hepatitis, as exemplified by upregulation of the CTL and IL-6 pathways.
Conclusions:
ICI-hepatitis was associated with circulating immune signature characterized by liver injury markers, inflammatory mediators, and enrichment of innate immune pathways. These findings provide molecular insight into the immunopathogenesis of ICI hepatitis and inform future biomarker discovery, druggable pathways, and risk stratification.
North American Imaging in Multiple Sclerosis (NAIMS) Cooperative
The spinal cord plays a central role in the pathophysiology and clinical manifestations of multiple sclerosis (MS), yet remains under-studie… (see more)d compared with the brain. This review summarizes key insights from the 2025 North American Imaging in MS Spinal Cord Imaging Workshop, highlighting recent advances, ongoing challenges, and future opportunities in MS spinal cord imaging. We review pathological studies and outline the clinical relevance of spinal cord lesions and atrophy for diagnosis, prognosis, and disease monitoring, highlighting emerging biomarkers of progression independent of relapse activity. Correlations between magnetic resonance imaging, histopathology, and clinical outcomes support the validation and translational potential of advanced spinal cord imaging techniques. Finally, we discuss spinal cord–specific processing pipelines and reproducibility challenges. Collectively, these insights underscore the need to integrate advanced and quantitative spinal cord imaging into clinical trials, research studies, and—when feasible—clinical care, to fully capture the extent of MS pathology, and ultimately improve patient outcomes.
Perturbation experiments are central to understanding cellular mechanisms, but remain costly and sparse, motivating prediction of gene expre… (see more)ssion responses for unobserved conditions. A promising recent direction leverages large language models (LLMs) as"virtual cell"simulators-using stepwise, knowledge-grounded mechanistic reasoning to infer differential expression-pointing toward an interpretable, knowledge-driven paradigm that transcends purely data-driven approaches. However, we find that plausibility is not prediction: despite producing biologically plausible explanations, these methods fail to capture perturbation-specific effects: systematically overestimating differential expression, often underperforming a simple gene-frequency baseline in aggregate evaluations, and collapsing to chance-level performance at the per-gene level. This reveals a reliance on intrinsic gene response tendencies rather than true perturbation reasoning. We trace this failure to how evidence is presented: existing methods evaluate perturbation-gene pairs in isolation, without exposing how related perturbations differ in their effects on the same gene. To address this limitation, we introduce CORE (Contrastive Organization of Relational Evidence), which reframes prediction as a comparison task by organizing evidence into positive and negative outcomes from related perturbations. Using a biomedical knowledge graph for evidence retrieval, CORE improves calibration and substantially boosts perturbation-specific prediction in both LLM-based and non-LLM settings: for example, on drug-perturbation data, CORE-Reasoning improves Qwen3.5-9B aggregate metrics by up to 28.6%, while on generic perturbation data, CORE-Voting raises macro-per-gene AUROC from chance to 0.703 in average across four cell lines. This highlights contrastive evidence organization as essential to reliable LLM-based perturbation reasoning
Despite tremendous recent progress, current text-guided image editing methods still struggle with many aspects of editing involving instruct… (see more)ion following, minimally editing the source image, and ensuring high visual quality. These problems are especially apparent when the requested edit is challenging, such as those that involve position, motion, viewpoint, scale and creative edits. To systematically test generative image editors, we propose a novel image editing benchmark -- TECCI: Tricky Edits of Collected and Curated Images. TECCI consists of a completely new set of images we are releasing. The images in TECCI span 7 image categories. The images and these categories were curated intentionally to target weaknesses of existing methods. The edit instructions in TECCI are automatically generated by Gemini, covering 5 edit types per source image. We also curated a set of 530 images for which we created challenging manually written edit instructions. Overall, TECCI contains 7550 pairs of images and edit instructions. We conduct human evaluations of five leading image editing models on TECCI. Humans judge outputs along three dimensions: 1) instruction following, 2) minimality of the edits, and 3) visual quality. To scale-up the evaluation, we also build an auto-rater using Gemini that achieves 74.7% accuracy in matching human evaluations. Our evaluations reveal that: 1) none of the models exceed a 22% overall success rate, demonstrating the challenging nature of TECCI, 2) Nano Banana Pro is the best performing model overall, 3) models perform significantly better at instruction following compared to minimal edits and visual quality, 4) models struggle with editing architecture and nature images which require strong understanding of spatial layout and intricate visual details. 5) reasoning and creative edits are the most difficult, whereas color and appearance edits are the easiest.
Reliable uncertainty quantification is essential for deploying Machine Learning Interatomic Potentials (MLIPs), also known as Neural Force F… (see more)ields, especially when molecular dynamics or materials simulations encounter configurations outside the training distribution.
Deep ensembles remain the strongest practical baseline for MLIP uncertainty, but training and storing several
copies of a modern pretrained model is often prohibitively expensive. We show that Bayesian Linear Last Layers (BLLs)
provide a scalable alternative for MLIPs: a single pretrained backbone supplies atomic features,
while exact Bayesian inference over the final force-prediction layer gives predictive uncertainties.
BLL is known to underestimate the uncertainties.
We provide an in-depth analysis that shows two sources of miscalibration and introduce
a simple post-hoc recalibration to address the issue.
On MPtrj and rMD17 benchmarks, including both in-distribution tests and increasingly out-of-distribution regimes,
BLLs that are recalibrated on in-distribution examples produce uncertainty estimates
competitive with ensembles, while using only one base model.
2026-05-29
AI4Science @ International Conference on Machine Learning (poster)
Conditional generative models can have difficulty generating attribute combinations absent from training, even when each individual factor i… (see more)s densely covered, otherwise known as a failure to compositionally generalize. We propose a factored conditional flow matching architecture that uses a shared base velocity augmented by per-factor heads, summed at the bottleneck. We show that on the Shapes3D and MPI3D-real datasets, the factored architecture matches or beats a parameter-matched monolithic baseline under three structured zero-shot holdout strengths over a two-attribute lattice, notably lowering heldout FID by
2026-05-29
SPIGM @ International Conference on Machine Learning (poster)
The discovery of novel materials is essential for driving scientific and technological breakthroughs. Recent work has explored fine-tuning l… (see more)arge language models (LLMs) for autoregressive crystal generation, but the ideal representation and training strategies for symmetry-based inductive biases remain unclear. We propose CrysTune, a class of LLMs fine-tuned on Wyckoff representations of crystals with two auxiliary tasks: canonicalization and template prediction. CrysTune shows competitive performance and improved stability-related metrics relative to LLMs trained on standard string-encoded representations. We further use these models as initial policies for reinforcement learning (RL) fine-tuning to optimize stability, validity, uniqueness, novelty, and diversity. RL-trained policies produce more valid and metastable crystals, while introducing novelty and diversity trade-offs. We also explore crystal system conditioning, showing that RL-trained policies produce a higher proportion of crystals matching the target condition.
2026-05-29
AI4Science @ International Conference on Machine Learning (poster)
Dominant approaches to Knowledge Base Question Answering (KBQA) fall into two categories. First is the generation of a formal query that suf… (see more)fers from brittleness and limited explainability, and the second is direct answer retrieval through KB exploration that is computationally costly and prone to hallucination. To combine the strengths of both paradigms while mitigating their respective weaknesses, we introduce DeSQ (Decomposition-based SPARQL Query Generation), a KB-agnostic framework that operates in three stages. First, it decomposes complex questions into Atomic Constraints (ACs) that mirror the relational structure of the underlying KB. Second, it generates a two-part structured output: (a) Mapping of each AC to its corresponding SPARQL Fragment, using standardized variable and URIs placeholders, and (b) URIs Grounding block describing each placeholder. Third, it assembles these fragments into a complete SPARQL query. DeSQ surpasses state-of-the-art approaches on four out of five major benchmarks and demonstrates superior robustness to lexical variation. Beyond performance gains, our framework greatly simplifies evaluation by eliminating the need for a live KB endpoint, and its structured output enables fine-grained error analysis, allowing more targeted interventions for improvement.
Offline reinforcement learning requires improving a policy from fixed data while avoiding out-of-distribution actions with unreliable value … (see more)estimates. Diffusion and flow policies handle this trade-off by modeling the behavior distribution to regularize the RL objective, but they require iterative denoising, solver integrations, and in more efficient variants, distillation or other approximations at inference. We propose DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement. The value signal biases the policy toward high-value regions of the data support, while attraction and repulsion together keep generated actions near the data and prevent collapse onto a single mode. DriftQL is implemented as a single network with a unified training objective and generates actions in a single forward pass. On D4RL and OGBench, DriftQL consistently outperforms diffusion and flow methods, advancing the state of the art. Under degraded data quality, where the baselines visibly struggle, DriftQL remains close to its clean-data performance, positioning it as a promising alternative to diffusion and flow-based methods while maintaining the simplicity and efficiency of deterministic approaches. Project page: https://driftql.github.io/