Publications

Automatic segmentation of spinal cord lesions in MS: A robust tool for axial T2-weighted MRI scans
Enamundram Naga Karthik
Julian McGinnis
Ricarda Wurm
Sebastian Ruehling
Robert Graf
Pierre-Louis Benveniste
Markus Lauerer
Jason Talbott
Rohit Bakshi
Shahamat Tauhid
Timothy Shepherd
Achim Berthele
Claus Zimmer
Bernhard Hemmer
Daniel Rueckert
Benedikt Wiestler
Jan S. Kirschke
Mark Mühlau
Deep learning models have achieved remarkable success in segmenting brain white matter lesions in multiple sclerosis (MS), becoming integral… (voir plus) to both research and clinical workflows. While brain lesions have gained significant attention in MS research, the involvement of spinal cord lesions in MS is relatively understudied. This is largely owed to the variability in spinal cord magnetic resonance imaging (MRI) acquisition protocols, high individual anatomical differences, the complex morphology and size of spinal cord lesions - and lastly, the scarcity of labeled datasets required to develop robust segmentation tools. As a result, automatic segmentation of spinal cord MS lesions remains a significant challenge. Although some segmentation tools exist for spinal cord lesions, most have been developed using sagittal T2-weighted (T2w) sequences primarily focusing on cervical spines. With the growing importance of spinal cord imaging in MS, axial T2w scans are becoming increasingly relevant due to their superior sensitivity in detecting lesions compared to sagittal acquisition protocols. However, most existing segmentation methods struggle to effectively generalize to axial sequences due to differences in image characteristics caused by the highly anisotropic spinal cord scans. To address these challenges, we developed a robust, open-source lesion segmentation tool tailored specifically for axial T2w scans covering the whole spinal cord. We investigated key factors influencing lesion segmentation, including the impact of stitching together individually acquired spinal regions, straightening the spinal cord, and comparing the effectiveness of 2D and 3D convolutional neural networks (CNNs). Drawing on these insights, we trained a multi-center model using an extensive dataset of 582 MS patients, resulting in a dataset comprising an entirety of 2,167 scans. We empirically evaluated the model's segmentation performance across various spinal segments for lesions with varying sizes. Our model significantly outperforms the current state-of-the-art methods, providing consistent segmentation across cervical, thoracic and lumbar regions. To support the broader research community, we integrate our model into the widely-used Spinal Cord Toolbox (v7.0 and above), making it accessible via the command sct_deepseg -task seg_sc_ms_lesion_axial_t2w -i .
Pitfalls of Evidence-Based AI Policy
Stephen Casper
Dylan Hadfield-Menell
Nations across the world are working to govern AI. However, from a technical perspective, the best way to do this is not yet clear. Meanwhil… (voir plus)e, recent debates over AI regulation have led to calls for “evidence-based AI policy” which emphasize holding regulatory action to a high evidentiary standard. Evidence is of irreplaceable value to policymaking. However, holding regulatory action to too high an evidentiary standard can lead to systematic neglect of certain risks. In historical policy debates (e.g., over tobacco ca. 1965 and fossil fuels ca. 1990) “evidence-based policy” rhetoric is also a well-precedented strategy to downplay the urgency of action, delay regulation, and protect industry interests. Here, we argue that if the goal is evidence-based AI policy, the first regulatory objective must be to actively facilitate the process of identifying, studying, and deliberating about AI risks. We discuss a set of 16 regulatory goals to facilitate this and show that the EU, UK, USA, Brazil, Canada, and China all have substantial opportunities to adopt further evidence-seeking policies.
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
Yun Zhu
Jia-Chen Gu
Caitlin Sikora
Ho Ko
Yinxiao Liu
Chu-Cheng Lin
Lei Shu
Liangchen Luo
Lei Meng
Jindong Chen
Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external context… (voir plus)s. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse RAG encodes retrieved documents in parallel, which eliminates latency introduced by long-range attention of retrieved documents. Then, LLMs selectively decode the output by only attending to highly relevant caches auto-regressively, which are chosen via prompting LLMs with special control tokens. It is notable that Sparse RAG combines the assessment of each individual document and the generation of the response into a single process. The designed sparse mechanism in a RAG system can facilitate the reduction of the number of documents loaded during decoding for accelerating the inference of the RAG system. Additionally, filtering out undesirable contexts enhances the model’s focus on relevant context, inherently improving its generation quality. Evaluation results on four datasets show that Sparse RAG can be used to strike an optimal balance between generation quality and computational efficiency, demonstrating its generalizability across tasks.
Accelerating neural network training: An analysis of the AlgoPerf competition
Priya Kasimbeg
Frank Schneider
Runa Eschenhagen
Juhan Bae
Chandramouli Shama Sastry
Mark Saroufim
BOYUAN FENG
Less Wright
Edward Z. Yang
Zachary Nado
Sourabh Medapati
Philipp Hennig
George E. Dahl
The goal of the AlgoPerf: Training Algorithms competition is to evaluate practical speed-ups in neural network training achieved solely by i… (voir plus)mproving the underlying training algorithms. In the external tuning ruleset, submissions must provide workload-agnostic hyperparameter search spaces, while in the self-tuning ruleset they must be completely hyperparameter-free. In both rulesets, submissions are compared on time-to-result across multiple deep learning workloads, training on fixed hardware. This paper presents the inaugural AlgoPerf competition's results, which drew 18 diverse submissions from 10 teams. Our investigation reveals several key findings: (1) The winning submission in the external tuning ruleset, using Distributed Shampoo, demonstrates the effectiveness of non-diagonal preconditioning over popular methods like Adam, even when compared on wall-clock runtime. (2) The winning submission in the self-tuning ruleset, based on the Schedule Free AdamW algorithm, demonstrates a new level of effectiveness for completely hyperparameter-free training algorithms. (3) The top-scoring submissions were surprisingly robust to workload changes. We also discuss the engineering challenges encountered in ensuring a fair comparison between different training algorithms. These results highlight both the significant progress so far, and the considerable room for further improvements.
Accelerating neural network training: An analysis of the AlgoPerf competition
Priya Kasimbeg
Frank Schneider
Runa Eschenhagen
Juhan Bae
Chandramouli Shama Sastry
Mark Saroufim
BOYUAN FENG
Less Wright
Edward Z. Yang
Zachary Nado
Sourabh Medapati
Philipp Hennig
George E. Dahl
The goal of the AlgoPerf: Training Algorithms competition is to evaluate practical speed-ups in neural network training achieved solely by i… (voir plus)mproving the underlying training algorithms. In the external tuning ruleset, submissions must provide workload-agnostic hyperparameter search spaces, while in the self-tuning ruleset they must be completely hyperparameter-free. In both rulesets, submissions are compared on time-to-result across multiple deep learning workloads, training on fixed hardware. This paper presents the inaugural AlgoPerf competition's results, which drew 18 diverse submissions from 10 teams. Our investigation reveals several key findings: (1) The winning submission in the external tuning ruleset, using Distributed Shampoo, demonstrates the effectiveness of non-diagonal preconditioning over popular methods like Adam, even when compared on wall-clock runtime. (2) The winning submission in the self-tuning ruleset, based on the Schedule Free AdamW algorithm, demonstrates a new level of effectiveness for completely hyperparameter-free training algorithms. (3) The top-scoring submissions were surprisingly robust to workload changes. We also discuss the engineering challenges encountered in ensuring a fair comparison between different training algorithms. These results highlight both the significant progress so far, and the considerable room for further improvements.
AFlow: Automating Agentic Workflow Generation
Jiayi Zhang
Jinyu Xiang
Zhaoyang Yu
Fengwei Teng
Xiong-Hui Chen
Jiaqi Chen
Mingchen Zhuge
Xin Cheng
Sirui Hong
Jinlin Wang
Bingnan Zheng
Yuyu Luo
Chenglin Wu
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing … (voir plus)agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFLOW, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Empirical evaluations across six benchmark datasets demonstrate AFLOW's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines. Furthermore, AFLOW enables smaller models to outperform GPT-4o on specific tasks at 4.55% of its inference cost in dollars. The code is available at https://github.com/geekan/MetaGPT.
AFlow: Automating Agentic Workflow Generation
Jiayi Zhang
Jinyu Xiang
Zhaoyang Yu
Fengwei Teng
Xiong-Hui Chen
Jiaqi Chen
Mingchen Zhuge
Xin Cheng
Sirui Hong
Jinlin Wang
Bingnan Zheng
Yuyu Luo
Chenglin Wu
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing … (voir plus)agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFLOW, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Empirical evaluations across six benchmark datasets demonstrate AFLOW's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines. Furthermore, AFLOW enables smaller models to outperform GPT-4o on specific tasks at 4.55% of its inference cost in dollars. The code is available at https://github.com/geekan/MetaGPT.
AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly
Hongyu Guo
Shengchao Liu
Molecular assembly, where a cluster of rigid molecules aggregated into strongly correlated forms, is fundamental to determining the properti… (voir plus)es of materials. However, traditional numerical methods for simulating this process are computationally expensive, and existing generative models on material generation overlook the rigidity inherent in molecular structures, leading to unwanted distortions and invalid internal structures in molecules. To address this, we introduce AssembleFlow. AssembleFlow leverages inertial frames to establish reference coordinate systems at the molecular level for tracking the orientation and motion of molecules within the cluster. It further decomposes molecular
AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly
Hongyu Guo
Shengchao Liu
Molecular assembly, where a cluster of rigid molecules aggregated into strongly correlated forms, is fundamental to determining the properti… (voir plus)es of materials. However, traditional numerical methods for simulating this process are computationally expensive, and existing generative models on material generation overlook the rigidity inherent in molecular structures, leading to unwanted distortions and invalid internal structures in molecules. To address this, we introduce AssembleFlow. AssembleFlow leverages inertial frames to establish reference coordinate systems at the molecular level for tracking the orientation and motion of molecules within the cluster. It further decomposes molecular
Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality
Ge Ya Luo
Gian Mario Favero
Zhi Hao Luo
The Fréchet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectivenes… (voir plus)s relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average. Project page: https://oooolga.github.io/JEDi.github.io/.
Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality
Ge Ya Luo
Gian Mario Favero
Zhi Hao Luo
The Fréchet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectivenes… (voir plus)s relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average. Project page: https://oooolga.github.io/JEDi.github.io/.
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez
Xiangru Jian
Siba Smarak Panigrahi
Abhay Puri
Akshay Kalkunte Suresh
François Savard
Amirhossein Abaskohi
Pierre-Andre Noel
Mats Leon Richter
Saverio Vadacchino
Sanket Biswas … (voir 23 de plus)
Sara Shanian
Ying Zhang
Noah Bolger
Kurt MacDonald
Simon Fauvel
Sathwik Tejaswi Madhusudhan
Srinivas Sunkara
Joao Monteiro
Krishnamurthy Dj Dvijotham
Torsten Scholak
Sepideh Kharaghani
Sean Hughes
M. Özsu
Issam Hadj Laradji
Perouz Taslakian
David Vazquez
Sai Rajeswar