Publications

A multivariable prediction model for invasive pulmonary aspergillosis in immunocompromised patients with acute respiratory failure (IPA-GRRR-OH score).

Alice Friol

Guillaume Dumas

Frédéric Pène

Alexandre Demoule

Achille Kouatchet

Laurent Argaud

Naike Bigé

Anne-Sophie Moreau

François Barbier

Djamel Mokart

Virginie Lemiale

Elie Azoulay

2025-01-23

Intensive Care Medicine (published)

doi.org

Systemizing Multiplicity: The Curious Case of Arbitrariness in Machine Learning

Prakhar Ganesh

Afaf Taïk

Golnoosh Farnadi

Algorithmic modeling relies on limited information in data to extrapolate outcomes for unseen scenarios, often embedding an element of arbit… (see more)rariness in its decisions. A perspective on this arbitrariness that has recently gained interest is multiplicity-the study of arbitrariness across a set of "good models", i.e., those likely to be deployed in practice. In this work, we systemize the literature on multiplicity by: (a) formalizing the terminology around model design choices and their contribution to arbitrariness, (b) expanding the definition of multiplicity to incorporate underrepresented forms beyond just predictions and explanations, (c) clarifying the distinction between multiplicity and other lenses of arbitrariness, i.e., uncertainty and variance, and (d) distilling the benefits and potential risks of multiplicity into overarching trends, situating it within the broader landscape of responsible AI. We conclude by identifying open research questions and highlighting emerging trends in this young but rapidly growing area of research.

2025-01-23

ArXiv (preprint)

doi.org

arxiv.org

Automatic segmentation of spinal cord lesions in MS: A robust tool for axial T2-weighted MRI scans

Enamundram Naga Karthik

Julian McGinnis

Ricarda Wurm

Sebastian Ruehling

Robert Graf

Jan Valosek

Pierre-Louis Benveniste

Markus Lauerer

Jason Talbott

Rohit Bakshi

Shahamat Tauhid

Timothy Shepherd

Achim Berthele

Claus Zimmer

Bernhard Hemmer

Daniel Rueckert

Benedikt Wiestler

Jan S. Kirschke

Julien Cohen-Adad

Mark Mühlau

Deep learning models have achieved remarkable success in segmenting brain white matter lesions in multiple sclerosis (MS), becoming integral… (see more) to both research and clinical workflows. While brain lesions have gained significant attention in MS research, the involvement of spinal cord lesions in MS is relatively understudied. This is largely owed to the variability in spinal cord magnetic resonance imaging (MRI) acquisition protocols, high individual anatomical differences, the complex morphology and size of spinal cord lesions - and lastly, the scarcity of labeled datasets required to develop robust segmentation tools. As a result, automatic segmentation of spinal cord MS lesions remains a significant challenge. Although some segmentation tools exist for spinal cord lesions, most have been developed using sagittal T2-weighted (T2w) sequences primarily focusing on cervical spines. With the growing importance of spinal cord imaging in MS, axial T2w scans are becoming increasingly relevant due to their superior sensitivity in detecting lesions compared to sagittal acquisition protocols. However, most existing segmentation methods struggle to effectively generalize to axial sequences due to differences in image characteristics caused by the highly anisotropic spinal cord scans. To address these challenges, we developed a robust, open-source lesion segmentation tool tailored specifically for axial T2w scans covering the whole spinal cord. We investigated key factors influencing lesion segmentation, including the impact of stitching together individually acquired spinal regions, straightening the spinal cord, and comparing the effectiveness of 2D and 3D convolutional neural networks (CNNs). Drawing on these insights, we trained a multi-center model using an extensive dataset of 582 MS patients, resulting in a dataset comprising an entirety of 2,167 scans. We empirically evaluated the model’s segmentation performance across various spinal segments for lesions with varying sizes. Our model significantly outperforms the current state-of-the-art methods, providing consistent segmentation across cervical, thoracic and lumbar regions. To support the broader research community, we integrate our model into the widely-used Spinal Cord Toolbox (v7.0 and above), making it accessible via the command sct_deepseg lesion_ms_axial_t2 -i <path-to-image.nii.gz>.

2025-01-22

medRxiv (preprint)

doi.org

Pitfalls of Evidence-Based AI Policy

Stephen Casper

David M. Krueger

Dylan Hadfield-Menell

Nations across the world are working to govern AI. However, from a technical perspective, the best way to do this is not yet clear. Meanwhil… (see more)e, recent debates over AI regulation have led to calls for “evidence-based AI policy” which emphasize holding regulatory action to a high evidentiary standard. Evidence is of irreplaceable value to policymaking. However, holding regulatory action to too high an evidentiary standard can lead to systematic neglect of certain risks. In historical policy debates (e.g., over tobacco ca. 1965 and fossil fuels ca. 1990) “evidence-based policy” rhetoric is also a well-precedented strategy to downplay the urgency of action, delay regulation, and protect industry interests. Here, we argue that if the goal is evidence-based AI policy, the first regulatory objective must be to actively facilitate the process of identifying, studying, and deliberating about AI risks. We discuss a set of 16 regulatory goals to facilitate this and show that the EU, UK, USA, Brazil, Canada, and China all have substantial opportunities to adopt further evidence-seeking policies.

2025-01-22

ICLR.cc/2025/BlogPosts (accepted)

openreview.net

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Yun Zhu

Jia-Chen Gu

Caitlin Sikora

Ho Ko

Yinxiao Liu

Chu-Cheng Lin

Lei Shu

Liangchen Luo

Lei Meng

Bang Liu

Jindong Chen

Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external context… (see more)s. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Speciﬁcally, Sparse RAG encodes retrieved documents in parallel, which eliminates latency introduced by long-range attention of retrieved documents. Then, LLMs selectively decode the output by only attending to highly relevant caches auto-regressively, which are chosen via prompting LLMs with special control tokens. It is notable that Sparse RAG combines the assessment of each individual document and the generation of the response into a single process. The designed sparse mechanism in a RAG system can facilitate the reduction of the number of documents loaded during decoding for accelerating the inference of the RAG system. Additionally, ﬁltering out undesirable contexts enhances the model’s focus on relevant context, inherently improving its generation quality. Evaluation results on four datasets show that Sparse RAG can be used to strike an optimal balance between generation quality and computational efﬁciency, demonstrating its generalizability across tasks.

2025-01-21

ICLR.cc/2025/Conference (poster)

openreview.net

Accelerating neural network training: An analysis of the AlgoPerf competition

Priya Kasimbeg

Frank Schneider

Runa Eschenhagen

Juhan Bae

Chandramouli Shama Sastry

Mark Saroufim

BOYUAN FENG

Less Wright

Edward Z. Yang

Zachary Nado

Sourabh Medapati

Philipp Hennig

Michael G. Rabbat

George E. Dahl

The goal of the AlgoPerf: Training Algorithms competition is to evaluate practical speed-ups in neural network training achieved solely by i… (see more)mproving the underlying training algorithms. In the external tuning ruleset, submissions must provide workload-agnostic hyperparameter search spaces, while in the self-tuning ruleset they must be completely hyperparameter-free. In both rulesets, submissions are compared on time-to-result across multiple deep learning workloads, training on fixed hardware. This paper presents the inaugural AlgoPerf competition's results, which drew 18 diverse submissions from 10 teams. Our investigation reveals several key findings: (1) The winning submission in the external tuning ruleset, using Distributed Shampoo, demonstrates the effectiveness of non-diagonal preconditioning over popular methods like Adam, even when compared on wall-clock runtime. (2) The winning submission in the self-tuning ruleset, based on the Schedule Free AdamW algorithm, demonstrates a new level of effectiveness for completely hyperparameter-free training algorithms. (3) The top-scoring submissions were surprisingly robust to workload changes. We also discuss the engineering challenges encountered in ensuring a fair comparison between different training algorithms. These results highlight both the significant progress so far, and the considerable room for further improvements.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Accelerating Training with Neuron Interaction and Nowcasting Networks

Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e.g. Adam). However,… (see more) learnable update rules can be costly and unstable to train and use. Recently, Jang et al. (2023) proposed a simpler approach to accelerate training based on weight nowcaster networks (WNNs). In their approach, Adam is used for most of the optimization steps and periodically, only every few steps, a WNN nowcasts (predicts near future) parameters. We improve WNNs by proposing neuron interaction and nowcasting (NiNo) networks. In contrast to WNNs, NiNo leverages neuron connectivity and graph neural networks to more accurately nowcast parameters. We further show that in some networks, such as Transformers, modeling neuron connectivity accurately is challenging. We address this and other limitations, which allows NiNo to accelerate Adam training by up to 50% in vision and language tasks.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Action Abstractions for Amortized Sampling

Lena Nehale Ezzine

Nikolay Malkin

As trajectories sampled by policies used by reinforcement learning (RL) and generative flow networks (GFlowNets) grow longer, credit assignm… (see more)ent and exploration become more challenging, and the long planning horizon hinders mode discovery and generalization. The challenge is particularly pronounced in entropy-seeking RL methods, such as generative flow networks, where the agent must learn to sample from a structured distribution and discover multiple high-reward states, each of which take many steps to reach. To tackle this challenge, we propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process. Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and `chunking' them into a single action that is added to the action space. In empirical evaluation on synthetic and real-world environments, our approach demonstrates improved sample efficiency performance in discovering diverse high-reward objects, especially on harder exploration problems. We also observe that the abstracted high-order actions are interpretable, capturing the latent structure of the reward landscape of the action space. This work provides a cognitively motivated approach to action abstraction in RL and is the first demonstration of hierarchical planning in amortized sequential sampling.

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Damien Martins Gomes

Yanlei Zhang

Eugene Belilovsky

Guy Wolf

Mahdi S. Hosseini

First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limi… (see more)ted curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD. However, their practicality in training DNNs is still limited due to increased per-iteration computations compared to the first-order methods. We present \emph{AdaFisher}--an adaptive second-order optimizer that leverages a \emph{diagonal block-Kronecker} approximation of the Fisher information matrix for adaptive gradient preconditioning. AdaFisher aims to bridge the gap between enhanced \emph{convergence/generalization} capabilities and computational efficiency in second-order optimization framework for training DNNs. Despite the slow pace of second-order optimizers, we showcase that AdaFisher can be reliably adopted for image classification, language modeling and stands out for its stability and robustness in hyper-parameter tuning. We demonstrate that AdaFisher \textbf{outperforms the SOTA optimizers} in terms of both accuracy and convergence speed. Code is available from https://github.com/AtlasAnalyticsLab/AdaFisher.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Advantage Alignment Algorithms

The growing presence of artificially intelligent agents in everyday decision-making, from LLM assistants to autonomous vehicles, hints at a … (see more)future in which conflicts may arise from each agent optimizing individual interests. In general-sum games these conflicts are apparent, where naive Reinforcement Learning agents get stuck in Pareto-suboptimal Nash equilibria. Consequently, opponent shaping has been introduced as a method with success at finding socially beneficial equilibria in social dilemmas. In this work, we introduce Advantage Alignment, a family of algorithms derived from first principles that perform opponent shaping efficiently and intuitively. This is achieved by aligning the advantages of conflicting agents in a given game by increasing the probability of mutually-benefiting actions. We prove that existing opponent shaping methods, including LOLA and LOQA, implicitly perform Advantage Alignment. Compared to these works, Advantage Alignment mathematically simplifies the formulation of opponent shaping and seamlessly works for continuous action domains. We also demonstrate the effectiveness of our algorithm in a wide range of social dilemmas, achieving state of the art results in each case, including a social dilemma version of the Negotiation Game.

2025-01-21

International Conference on Learning Representations (oral)

doi.org

openreview.net

AFlow: Automating Agentic Workflow Generation

Jiayi Zhang

Jinyu Xiang

Zhaoyang Yu

Fengwei Teng

Xiong-Hui Chen

Jiaqi Chen

Mingchen Zhuge

Xin Cheng

Sirui Hong

Jinlin Wang

Bingnan Zheng

Bang Liu

Yuyu Luo

Chenglin Wu

Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing … (see more)agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFLOW, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Empirical evaluations across six benchmark datasets demonstrate AFLOW's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines. Furthermore, AFLOW enables smaller models to outperform GPT-4o on specific tasks at 4.55% of its inference cost in dollars. The code is available at https://github.com/geekan/MetaGPT.

2025-01-21

ICLR.cc/2025/Conference (oral)

openreview.net

Ant Colony Sampling with GFlowNets for Combinatorial Optimization

Jiwoo Son

Jinkyoo Park

We present the Generative Flow Ant Colony Sampler (GFACS), a novel meta-heuristic method that hierarchically combines amortized inference an… (see more)d parallel stochastic search. Our method first leverages Generative Flow Networks (GFlowNets) to amortize a \emph{multi-modal} prior distribution over combinatorial solution space that encompasses both high-reward and diversified solutions. This prior is iteratively updated via parallel stochastic search in the spirit of Ant Colony Optimization (ACO), leading to the posterior distribution that generates near-optimal solutions. Extensive experiments across seven combinatorial optimization problems demonstrate GFACS's promising performances.

2025-01-21

aistats.org/AISTATS/2025/Conference (poster)

doi.org

proceedings.mlr.press

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications