Publications

Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers

2025-10-12

ArXiv (preprint)

doi.org

arxiv.org

Imagining Alternatives: Towards High-Resolution 3D Counterfactual Medical Image Generation via Language Guidance

Mohamed Mohamed

Brennan Nichyporuk

Douglas Arnold

Tal Arbel

Vision-language models have demonstrated impressive capabilities in generating 2D images under various conditions; however the impressive pe… (see more)rformance of these models in 2D is largely enabled by extensive, readily available pretrained foundation models. Critically, comparable pretrained foundation models do not exist for 3D, significantly limiting progress in this domain. As a result, the potential of vision-language models to produce high-resolution 3D counterfactual medical images conditioned solely on natural language descriptions remains completely unexplored. Addressing this gap would enable powerful clinical and research applications, such as personalized counterfactual explanations, simulation of disease progression scenarios, and enhanced medical training by visualizing hypothetical medical conditions in realistic detail. Our work takes a meaningful step toward addressing this challenge by introducing a framework capable of generating high-resolution 3D counterfactual medical images of synthesized patients guided by free-form language prompts. We adapt state-of-the-art 3D diffusion models with enhancements from Simple Diffusion and incorporate augmented conditioning to improve text alignment and image quality. To our knowledge, this represents the first demonstration of a language-guided native-3D diffusion model applied specifically to neurological imaging data, where faithful three-dimensional modeling is essential to represent the brain's three-dimensional structure. Through results on two distinct neurological MRI datasets, our framework successfully simulates varying counterfactual lesion loads in Multiple Sclerosis (MS), and cognitive states in Alzheimer's disease, generating high-quality images while preserving subject fidelity in synthetically generated medical images. Our results lay the groundwork for prompt-driven disease progression analysis within 3D medical imaging.

2025-10-11

Lecture Notes in Computer Science (published)

doi.org

arxiv.org

CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models

Zhuxuanzi Wang

Mingqiao Mo

Xi Xiao

Chen Liu

Chenrui Ma

Yunbei Zhang

Xiao Wang

Smita Krishnaswamy

Tianyang Wang

Parameter-efficient fine-tuning (PEFT) has become the standard approach for adapting large language models under limited compute and memory … (see more)budgets. Although previous methods improve efficiency through low-rank updates, quantization, or heuristic budget reallocation, they often decouple the allocation of capacity from the way updates evolve during training. In this work, we introduce CTR-LoRA, a framework guided by curvature trust region that integrates rank scheduling with stability-aware optimization. CTR-LoRA allocates parameters based on marginal utility derived from lightweight second-order proxies and constrains updates using a Fisher/Hessian-metric trust region. Experiments on multiple open-source backbones (7B-13B), evaluated on both in-distribution and out-of-distribution benchmarks, show consistent improvements over strong PEFT baselines. In addition to increased accuracy, CTR-LoRA enhances training stability, reduces memory requirements, and achieves higher throughput, positioning it on the Pareto frontier of performance and efficiency. These results highlight a principled path toward more robust and deployable PEFT.

2025-10-10

ArXiv (preprint)

doi.org

arxiv.org

Permissive Information-Flow Analysis for Large Language Models

Shoaib Ahmed Siddiqui

Radhika Gaonkar

Boris Köpf

David M. Krueger

Andrew Paverd

Ahmed Salem

Shruti Tople

Lukas Wutschitz

Menglin Xia

Santiago Zanella-Beguelin

Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy p… (see more)roblems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. Assuming each piece of information comes with an additional meta-label (such as low/high integrity labels), one promising approach is to tackle this problem at the system level via dynamic information flow (aka taint) tracking. Unfortunately, this approach of propagating the most restrictive input label to the output is too conservative for applications where LLMs operate on inputs retrieved from diverse sources. In this paper, we propose a novel, more permissive approach to propagate information flow labels through LLM queries. The key idea behind our approach is to propagate only the input labels that were \emph{influential} in generating the model output and to eliminate the labels of unnecessary inputs. We implement and investigate the effectiveness of two variations of this approach, based on (i) prompt-based retrieval augmentation, and (ii) a

2025-10-10

TMLR (accepted)

openreview.net

GWSkyNet-Multi. II. An Updated Machine Learning Model for Rapid Classification of Gravitational-wave Events

Nayyer Raza

Man Leong Chan

Daryl Haggard

Ashish Mahabal

Jess McIver

Audrey Durand

Alexandre Larouche

Hadi Moazen

Multimessenger observations of gravitational waves and electromagnetic emission from compact object mergers offer unique insights into the s… (see more)tructure of neutron stars, the formation of heavy elements, and the expansion rate of the Universe. With the LIGO–Virgo–KAGRA (LVK) gravitational-wave detectors currently in their fourth observing run (O4), it is an exciting time for detecting these mergers. However, assessing whether to follow up a candidate gravitational-wave event given limited telescope time and resources is challenging; the candidate can be a false alert due to detector glitches, or may not have any detectable electromagnetic counterpart even if it is real. GWSkyNet-Multi is a machine learning model developed to facilitate follow-up decisions by providing real-time classification of candidate events, using localization information released in LVK rapid public alerts. Here we introduce GWSkyNet-Multi II, an updated model targeted toward providing more robust and informative predictions during O4 and beyond. Specifically, the model now provides normalized probability scores and associated uncertainties for each of the four corresponding source categories released by the LVK: glitch, binary black hole, neutron star–black hole, and binary neutron star. Informed by explainability studies of the original model, the updated model architecture is also significantly simplified, including replacing input images with intuitive summary values that are more interpretable. For significant event alerts issued during O4a and O4b, GWSkyNet-Multi II produces a prediction that is consistent with the updated LVK classification for 93% of events. The updated model can be used by the community to help make time-critical follow-up decisions.

2025-10-09

Astrophysical Journal (published)

doi.org

Scaling Laws and Symmetry, Evidence from Neural Force Fields

Khang Ngo

Siamak Ravanbakhsh

We present an empirical study in the geometric task of learning interatomic potentials, which shows equivariance matters even more at larger… (see more) scales; we show a clear power-law scaling behaviour with respect to data, parameters and compute with ``architecture-dependent exponents''. In particular, we observe that equivariant architectures, which leverage task symmetry, scale better than non-equivariant models. Moreover, among equivariant architectures, higher-order representations translate to better scaling exponents. Our analysis also suggests that for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture. At a high level, these results suggest that, contrary to common belief, we should not leave it to the model to discover fundamental inductive biases such as symmetry, especially as we scale, because they change the inherent difficulty of the task and its scaling laws.

2025-10-09

ArXiv (preprint)

doi.org

arxiv.org

Sound and Modular Activity Analysis for Automatic Differentiation in MLIR

Mai Jacob Peng

William S. Moses

Oleksandr Zinenko

Christophe Dubach

Computing derivatives is paramount for multiple domains ranging from training neural networks to precise climate simulations. While derivati… (see more)ves can be generated by Automatic Differentiation (AD) tools, they often require aggressive optimization to avoid compromising program performance. One of the central optimizations consists of identifying inactive operations that do not contribute to the partial derivatives of interest. Multiple tools provide activity analyses for a variety of input languages, though often with only informal correctness guarantees. This paper formally defines activity analysis for AD as an abstract interpretation, proves its soundness, and implements it within the MLIR compiler infrastructure. To account for MLIR’s genericity, a subset of MLIR’s internal representation amenable to AD is formalized for the first time. Furthermore, the paper proposes a sound intraprocedural approximation of the whole-program activity analysis via function summaries along with a mechanism to automatically derive these summaries from function definitions. The implementation is evaluated on a differentiation-specific benchmark suite. It achieves a 1.24 geometric mean speedup on CPU and a 1.7 geometric mean speedup on GPU in the runtime of generated programs, when compared to a baseline that does not use activity analysis. The evaluation also demonstrates that the intraprocedural analysis with function summaries proves inactive 100% of instructions proven inactive by the whole-program analysis.

2025-10-08

Proceedings of the ACM on Programming Languages (published)

doi.org

Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models

David Layden

Ryan Sweke

Vojtvech Havl'ivcek

Anirban Chowdhury

Kirill Neklyudov

2025-10-08

ArXiv (preprint)

doi.org

arxiv.org

Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease

Peter Plantinga

Roozbeh Sattari

Karine Marcotte

Carla Di Gironimo

Madeleine Sharp

Liziane Bouvier

Maiya Geddes

Ingrid Verduyckt

'Etienne de Villers-Sidani

Mirco Ravanaelli

Denise Klein

2025-10-07

ArXiv (preprint)

doi.org

arxiv.org

High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization

Masih Aminbeidokhti

Heitor Rapela Medeiros

Srikanth Muralidharan

Eric Granger

Marco Pedersoli

2025-10-07

ArXiv (preprint)

doi.org

arxiv.org

Revisiting Mixout: An Overlooked Path to Robust Finetuning

Masih Aminbeidokhti

Heitor Rapela Medeiros

Eric Granger

Marco Pedersoli

Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revis… (see more)it Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretrained reference, through the lens of a single-run, weight-sharing implicit ensemble. This perspective reveals three key levers that govern robustness: the \emph{masking anchor}, \emph{resampling frequency}, and \emph{mask sparsity}. Guided by this analysis, we introduce GMixout, which (i) replaces the fixed anchor with an exponential moving-average snapshot that adapts during training, and (ii) regulates masking period via an explicit resampling-frequency hyperparameter. Our sparse-kernel implementation updates only a small fraction of parameters with no inference-time overhead, enabling training on consumer-grade GPUs. Experiments on benchmarks covering covariate shift, corruption, and class imbalance, ImageNet / ImageNet-LT, DomainNet, iWildCam, and CIFAR100-C, GMixout consistently improves in-domain accuracy beyond zero-shot performance while surpassing both Model Soups and strong parameter-efficient finetuning baselines under distribution shift.

2025-10-07

ArXiv (preprint)

doi.org

arxiv.org

TGM: a Modular and Efficient Library for Machine Learning on Temporal Graphs

Tran Gia Bao Ngo

Jure Leskovec

Michael M. Bronstein

Guillaume Rabusseau

Matthias Fey

Reihaneh Rabbany

Well-designed open-source software drives progress in Machine Learning (ML) research. While static graph ML enjoys mature frameworks like Py… (see more)Torch Geometric and DGL, ML for temporal graphs (TG), networks that evolve over time, lacks comparable infrastructure. Existing TG libraries are often tailored to specific architectures, hindering support for diverse models in this rapidly evolving field. Additionally, the divide between continuous- and discrete-time dynamic graph methods (CTDG and DTDG) limits direct comparisons and idea transfer. To address these gaps, we introduce Temporal Graph Modelling (TGM), a research-oriented library for ML on temporal graphs, the first to unify CTDG and DTDG approaches. TGM offers first-class support for dynamic node features, time-granularity conversions, and native handling of link-, node-, and graph-level tasks. Empirically, TGM achieves an average 7.8x speedup across multiple models, datasets, and tasks compared to the widely used DyGLib, and an average 175x speedup on graph discretization relative to available implementations. Beyond efficiency, we show in our experiments how TGM unlocks entirely new research possibilities by enabling dynamic graph property prediction and time-driven training paradigms, opening the door to questions previously impractical to study. TGM is available at https://github.com/tgm-team/tgm

2025-10-07

ArXiv (preprint)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications