Publications

Distributed Combined Space Partitioning and Network Flow Optimization: an Optimal Transport Approach (Extended Version)

Th'eo Laurentin

Patrick Coirault

Emmanuel Moulay

Antoine Lesage-Landry

J'erome Le Ny

2025-08-29

ArXiv (prépublication)

arxiv.org

Aperiodic and Periodic EEG Component Lifespan Trajectories: Monotonic Decrease versus Growth-then-Decline

Min Li

Yong Wang

Yaqi Chen

Adrien Dubois

Gangyong Jia

Qing Wu

Maria L. Bringas-Vega

Guillaume Dumas

Pedro A. Valdes-Sosab

2025-08-27

bioRxiv (prépublication)

doi.org

Rootlets-based registration to the PAM50 spinal cord template

Sandrine Bédard

Jan Valosek

Valeria Oliva

Kenneth A. Weber

Julien Cohen-Adad

Abstract Spinal cord functional MRI studies require precise localization of spinal levels for reliable voxel-wise group analyses. Traditiona… (voir plus)l template-based registration of the spinal cord uses intervertebral discs for alignment. However, substantial anatomical variability across individuals exists between vertebral and spinal levels. This study proposes a novel registration approach that leverages spinal nerve rootlets to improve alignment accuracy and reproducibility across individuals. We developed a registration method leveraging dorsal cervical rootlets segmentation and aligning them non-linearly with the PAM50 spinal cord template. Validation was performed on a multi-subject, multi-site dataset (n = 267, 44 sites) and a multi-subject dataset with various neck positions (n = 10, 3 sessions). We further validated the method on task-based functional MRI (n = 23) to compare group-level activation maps using rootlet-based registration to traditional disc-based methods. Rootlet-based registration showed superior alignment across individuals compared with the traditional disc-based method on n = 226 individuals, and on n = 176 individuals for morphological analyses. Notably, rootlet positions were more stable across neck positions. Group-level analysis of task-based functional MRI using rootlet-based registration increased Z scores and activation cluster size compared with disc-based registration (number of active voxels from 3292 to 7978). Rootlet-based registration enhances both inter- and intra-subject anatomical alignment and yields better spatial normalization for group-level fMRI analyses. Our findings highlight the potential of rootlet-based registration to improve the precision and reliability of spinal cord neuroimaging group analysis.

2025-08-26

Imaging Neuroscience (publié)

doi.org

arxiv.org

Amortized Sampling with Transferable Normalizing Flows

Charlie B. Tan

Majdi Hassan

Leon Klein

Saifuddin Syed

Dominique Beaini

Michael M. Bronstein

Alexander Tong

Kirill Neklyudov

Efficient equilibrium sampling of molecular conformations remains a core challenge in computational chemistry and statistical inference. Cla… (voir plus)ssical approaches such as molecular dynamics or Markov chain Monte Carlo inherently lack amortization; the computational cost of sampling must be paid in-full for each system of interest. The widespread success of generative models has inspired interest into overcoming this limitation through learning sampling algorithms. Despite performing on par with conventional methods when trained on a single system, learned samplers have so far demonstrated limited ability to transfer across systems. We prove that deep learning enables the design of scalable and transferable samplers by introducing Prose, a 280 million parameter all-atom transferable normalizing flow trained on a corpus of peptide molecular dynamics trajectories up to 8 residues in length. Prose draws zero-shot uncorrelated proposal samples for arbitrary peptide systems, achieving the previously intractable transferability across sequence length, whilst retaining the efficient likelihood evaluation of normalizing flows. Through extensive empirical evaluation we demonstrate the efficacy of Prose as a proposal for a variety of sampling algorithms, finding a simple importance sampling-based finetuning procedure to achieve superior performance to established methods such as sequential Monte Carlo on unseen tetrapeptides. We open-source the Prose codebase, model weights, and training dataset, to further stimulate research into amortized sampling methods and finetuning objectives.

2025-08-25

ArXiv (prépublication)

doi.org

arxiv.org

Uncovering executive function profiles within interindividual variability: A data driven clustering exploration of design fluency in school-aged children

Myriam Sahraoui

Karim Jerbi

Vanessa Hadid

Bruno Gauthier

2025-08-24

bioRxiv (prépublication)

doi.org

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

Juan A. Rodriguez

Sai Rajeswar

ServiceNow

WebMMU Benchmark

We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing inv… (voir plus)olving HTML/CSS/JavaScript, and (3) mockup-to-code generation. Unlike prior benchmarks that treat these tasks separately, WebMMU unifies them using expert-annotated, real-world web data to assess models'abilities in complex multi-step reasoning, precise element grounding, and functional UI comprehension and coding. Our evaluation shows that while multimodal large language models (MLLMs) perform well on basic information extraction, they struggle with reasoning and grounding, editing code to preserve functionality, and generating design-to-code that maintains hierarchy and supports multilingual content. These findings reveal key limitations in current MLLMs and underscore the need for improved multimodal and cross-lingual reasoning to build future web agents capable of automating diverse web development tasks.

2025-08-22

ArXiv (prépublication)

doi.org

arxiv.org

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

Juan A. Rodriguez

Sai Rajeswar

ServiceNow

WebMMU Benchmark

We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing inv… (voir plus)olving HTML/CSS/JavaScript, and (3) mockup-to-code generation. Unlike prior benchmarks that treat these tasks separately, WebMMU unifies them using expert-annotated, real-world web data to assess models'abilities in complex multi-step reasoning, precise element grounding, and functional UI comprehension and coding. Our evaluation shows that while multimodal large language models (MLLMs) perform well on basic information extraction, they struggle with reasoning and grounding, editing code to preserve functionality, and generating design-to-code that maintains hierarchy and supports multilingual content. These findings reveal key limitations in current MLLMs and underscore the need for improved multimodal and cross-lingual reasoning to build future web agents capable of automating diverse web development tasks.

2025-08-22

ArXiv (prépublication)

doi.org

arxiv.org

Communication Efficient LLM Pre-training with SparseLoCo

Amir M. Sarfi

Benjamin Therien

Joel Lidin

Eugene Belilovsky

2025-08-21

ArXiv (prépublication)

doi.org

arxiv.org

Low-dimensional embeddings of high-dimensional data

Cyril de Bodt

Alex Diaz-Papkovich

Michael Bleher

Kerstin Bunte

Corinna Coupette

Sebastian Damrich

Enrique Fita Sanmartin

Fred A. Hamprecht

EmHoke-'Agnes Horv'at

Dhruv Kohli

Smita Krishnaswamy

John A. Lee 0001

Boudewijn P. F. Lelieveldt

Leland McInnes

Ian T. Nabney

Maximilian Noichl

Pavlin G. Polivcar

Bastian Rieck

Guy Wolf

Gal Mishne … (voir 1 de plus)

Dmitry Kobak

Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (voir plus)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.

2025-08-21

ArXiv (prépublication)

doi.org

arxiv.org

Low-dimensional embeddings of high-dimensional data

Cyril de Bodt

Alex Diaz-Papkovich

Michael Bleher

Kerstin Bunte

Corinna Coupette

Sebastian Damrich

Enrique Fita Sanmartin

Fred Hamprecht

EmHoke-'Agnes Horv'at

Dhruv Kohli

Smita Krishnaswamy

John A. Lee 0001

Boudewijn P. F. Lelieveldt

Leland McInnes

Ian T. Nabney

Maximilian Noichl

Pavlin G. Polivcar

Bastian Rieck

Guy Wolf

Gal Mishne … (voir 1 de plus)

Dmitry Kobak

Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (voir plus)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.

2025-08-21

ArXiv (prépublication)

doi.org

arxiv.org

On the Challenges and Opportunities in Generative AI

Laura Manduchi

Clara Meister

Kushagra Pandey

Robert Bamler

Ryan Cotterell

Sina Däubener

Sophie Fellenz

Asja Fischer

Thomas Gärtner

Matthias Kirchler

Marius Kloft

Yingzhen Li

Christoph Lippert

Gerard de Melo

Eric Nalisnick

Björn Ommer

Rajesh Ranganath

Maja Rudolph

Karen Ullrich

Guy Van den Broeck … (voir 6 de plus)

Julia E Vogt

Yixin Wang

Florian Wenzel

Frank N. Wood

Stephan Mandt

Vincent Fortuin

2025-08-21

TMLR (accepté)

doi.org

openreview.net

Robustness of Neural Ratio and Posterior Estimators to Distributional Shifts for Population-Level Dark Matter Analysis in Strong Gravitational Lensing

Andreas Filipp

Yashar Hezaveh

Laurence Perreault-Levasseur

2025-08-20

The Astrophysical Journal (publié)

doi.org

arxiv.org