Publications

A Meta-Learning Approach to Causal Inference
Dragos Cristian Manta
Predicting the effect of unseen interventions is at the heart of many scientific endeavours. While causal discovery is often used to answer … (voir plus)these causal questions, it involves learning a full causal model, not tailored to the specific goal of predicting unseen interventions, and operates under stringent assumptions. We introduce a novel method based on meta-learning that predicts interventional effects without explicitly assuming a causal model. Our preliminary results on synthetic data show that it can provide good generalization to unseen interventions, and it even compares favorably to a causal discovery method. Our model-agnostic method opens up many avenues for future exploration, particularly for settings where causal discovery cannot be applied.
PyLO: Towards Accessible Learned Optimizers in PyTorch
Quentin Gregory Anthony
Xiaolong Huang
Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optim… (voir plus)izers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances -- such as VeLO, which was meta-trained for 4000 TPU-months -- remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for applying the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the broader machine learning community through familiar, widely adopted workflows. Unlike prior work focused on synthetic or convex tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our release includes a CUDA-accelerated version of the small_fc_lopt learned optimizer architecture from (Metz et al., 2022a), delivering substantial speedups -- from 39.36 to 205.59 samples/sec throughput for training ViT B/16 with batch size 32. PyLO also allows us to easily combine learned optimizers with existing optimization tools such as learning rate schedules and weight decay. When doing so, we find that learned optimizers can substantially benefit. Our code is available at https://github.com/Belilovsky-Lab/pylo
Quantized Disentanglement: A Practical Approach
Vitória Barin-Pacela
P Vincent
Revisiting the Goldilocks Zone in Inhomogeneous Networks
Zacharie Garnier Cuchet
A. Chandar
We investigate how architectural inhomogeneities—such as biases, layer normalization, and residual connections—affect the curvature of t… (voir plus)he loss landscape at initialization and its link to trainability. We focus on the Goldilocks zone, a region in parameter space with excess positive curvature, previously associated with improved optimization in homogeneous networks. To extend this analysis, we compare two scaling strategies: weight scaling and softmax temperature scaling. Our results show that in networks with biases or residual connections, both strategies identify a Goldilocks zone aligned with better training. In contrast, layer normalization leads to lower or negative curvature, yet stable optimization—revealing a disconnect between curvature and trainability. Softmax temperature scaling behaves more consistently across models, making it a more robust probe. Overall, the Goldilocks zone remains relevant in inhomogeneous networks, but its geometry and predictive power depend on architectural choices, particularly normalization.
Spaced Scheduling for Large Language Model Training
Amine El hattami
Christopher Pal
TGM: A Modular Framework for Machine Learning on Temporal Graphs
While deep learning on static graphs has been revolutionized by standardized libraries like PyTorch Geometric and DGL, machine learning on T… (voir plus)emporal Graphs (TG), networks that evolve over time, lacks comparable software infrastructure. Existing TG libraries are limited in scope, focusing on a single method category or specific algorithms. We introduce Temporal Graph Modelling (TGM), a comprehensive framework for machine learning on temporal graphs to address this gap. Through a modular architecture, TGM is the first library to support both discrete and continuous-time TG methods and implements a wide range of TG methods. The TGM framework combines an intuitive front-end API with an optimized backend storage, enabling reproducible research and efficient experimentation at scale. Key features include graph-level optimizations for offline training and built-in performance profiling capabilities. Through extensive benchmarking on five real-world networks, TGM is up to 6 times faster than the widely used DyGLib library on TGN and TGAT models and up to 8 times faster than the UTG framework for converting edges into coarse-grained snapshots.
Towards Fair In-Context Learning with Tabular Foundation Models
Patrik Joslin Kenfack
S Ebrahimi Kahou
Ulrich Matchi Aïvodji
Tabular foundational models have shown promising in-context learning capabilities on structured data by using training examples as context w… (voir plus)ithout further parameter adjustments. This emerging approach positions itself as a competitive alternative to traditional gradient-boosted tree methods. However, while biases in conventional machine learning models are well documented, it remains unclear how these biases manifest in Tabular ICL. The paper investigates the fairness implications of Tabular ICL and explores three preprocessing strategies—correlation removal, group-balanced demonstration selection, and uncertainty-based demonstration selection—to address bias. Comprehensive experiments indicate that uncertainty-based demonstration selection consistently enhances group fairness in the predictions. The source code for reproducing the results of this work can be found at https://anonymous.4open.science/r/Fair-TabICL-DD84.
Two-point deterministic equivalence for SGD in random feature models
Alexander Atanasov
Blake Bordelon
Jacob A Zavatone-Veth
Cengiz Pehlevan
Ultrasound and MRI-based evaluation of relationships between morphological and mechanical properties of the lower lumbar multifidus muscle in chronic low back pain
Neda Naghdi
Sara Masi
Cleo Bertrand
Brent Rosenstein
Hassan Rivaz
Mathieu Roy
Maryse Fortin
While lumbar multifidus (MF) muscle alterations are linked to low back pain (LBP), the structure-function relationship is not fully understo… (voir plus)od. This study aims to evaluate the relationship between fatty degeneration of the lumbar MF muscle and its function in individuals with and without LBP. The study included 25 participants with chronic nonspecific LBP and 25 age- and sex-matched healthy controls. Participants underwent MRI assessment for MF fat infiltration, utilizing IDEAL fat-water images. Ultrasound measures evaluated MF function, including shear-wave elastography (SWE) for stiffness/elasticity and thickness ratio from rest to submaximal contraction. All measurements were acquired at L4/L5 and L5/S1 spinal levels, bilaterally. Bivariate and multivariable linear regression models were used to assess the relationship between morphology and function, while age, sex, body max index (BMI), physical activity levels, and LBP status were considered as covariates. Fifty participants (26 females) were included (mean age: 39.22 ± 11.67). Greater % MF fat at L4/L5 was significantly associated with greater MF SWE ratio (p = 0.002). No significant bivariate or multivariable relationships were found between MF fat infiltration and MF thickness ratio. Participants with LBP exhibited lower contraction ratios (p = 0.017) and higher SWE during contraction (p = 0.03) at L4/L5 compared to controls. This study highlights a positive association between MF fat infiltration and SWE-based stiffness measures at L4/L5, suggesting altered muscle composition may impacts MF function. However, no relationship was found between MF fat infiltration and contraction. Participants with LBP demonstrated distinct deficits in muscle activation, supporting the need for targeted rehabilitation strategies addressing these functional impairments.
Multi-Priority Scheduling for Traffic Management in Future Scalable Payloads
Zineb Garroussi
Olfa Ben Yahia
Brunilde Sansò
Jean-François Frigon
Stéphane Martel
Guillaume Mantelet
Gunes Karabulut Kurt
Through multibeam, frequency reuse, and advanced antenna technology, regenerative non-geostationary orbit (NGSO) extremely high-throughput s… (voir plus)atellites (EHTS) are expected to play a key role in future communications, delivering data rates up to terabits per second. This paper investigates a novel architecture for future regenerative and scalable payloads to satisfy users’ demands for varying quality of service (QoS). This architecture is designed based on multiple modem banks and requires a new flow assignment strategy to efficiently route traffic within the satellite. We propose a multi-commodity path flow optimization problem to manage the load with varying QoS requirements across multiple modems within an NGSO high-throughput satellite (HTS) system and beyond. The simulation results demonstrate that the proposed model consistently maintains low delays and packet losses for the highest-priority traffic and outperforms the classical first-in, first-out (FIFO) approach.
Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning
Chandra Kiran Reddy Evuru
Joshua Kazdan
Avinandan Bose
Maryam Fazel
Sai Rajeswar
Jason Stanley
Krishnamurthy Dj Dvijotham
The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest in imp… (voir plus)roving these capabilities by explicitly fine-tuning the LLMs/VLMs that power these agents. Several researchers have proposed collecting data by letting the agents interact with their environment (e.g., a computer operating system, the web or a collection of APIs exposed as tools), and improve agent performance by fine tuning on this data. In this work, we show that such data collection can be manipulated by adversaries to insert poisoned traces. By modifying just 5% of collected traces, adversaries can embed stealthy bad behaviors into agents—like leaking confidential user information whenever the tool or webpage exposes a trigger. Our results raise important security concerns in the development of AI agents, and underscore the importance of careful scrutiny of all data collection processes used to improve agentic AI.
A Self-Supervised Foundation Model for Robust and Generalizable Representation Learning in STED Microscopy
Anthony Bilodeau
Julia Chabbert
Kamylle Thériault
Andréanne Deschênes
Jean-Michel Bellavance
Koraly Lessard
Renaud Bernatchez
Paul De Koninck
Foundation Models (FMs) have dramatically increased the potential and power of deep learning algorithms through general capacities over a va… (voir plus)riety of tasks. The performance increase they offer is obtained without elaborated specific trainings for domains such as natural language processing and computer vision. However, their application in specialized fields like biomedical imaging and fluorescence microscopy remains difficult due to distribution shifts and the scarcity of high-quality annotated datasets. The high cost of data acquisition and the requirement for in-domain expertise further exacerbate this challenge in microscopy. To address this we introduce STED-FM, a foundation model specifically designed for super-resolution STimulated Emission Depletion (STED) microscopy. STED-FM leverages a Vision Transformer architecture trained at scale with Masked Autoencoding on a new dataset of nearly one million STED images. STED-FM learns expressive latent representations without requiring extensive annotations, yielding robust performance across diverse downstream microscopy image analysis tasks. Unsupervised experiments demonstrate the discriminative structure of its learned latent space. These representations can be leveraged for multiple downstream applications, including fully supervised classification and segmentation with reduced annotation requirements. Moreover, STED-FM representations enhance the performance of deep learning–based image denoising and improve the quality of images generated by diffusion models, enabling latent attribute manipulation for the data-driven discovery of subtle nanostructures and phenotypes, as well as algorithmic super-resolution. Moreover, its powerful structure retrieval capabilities are integrated into automated STED microscopy acquisition pipelines, paving the way for smart microscopy. In sum, we demonstrate that STED-FM lays a robust foundation for state-of-the-art algorithms across a wide array of tasks, establishing it as a highly valuable and scalable resource for researchers in super-resolution microscopy.