Publications

Distributed Combined Space Partitioning and Network Flow Optimization: an Optimal Transport Approach (Extended Version)
Th'eo Laurentin
Patrick Coirault
Emmanuel Moulay
J'erome Le Ny
Aperiodic and Periodic EEG Component Lifespan Trajectories: Monotonic Decrease versus Growth-then-Decline
Min Li
Yong Wang
Yaqi Chen
Adrien Dubois
Gangyong Jia
Qing Wu
Maria L. Bringas-Vega
Pedro A. Valdes-Sosab
Rootlets-based registration to the PAM50 spinal cord template
Valeria Oliva
Kenneth A. Weber
Abstract Spinal cord functional MRI studies require precise localization of spinal levels for reliable voxel-wise group analyses. Traditiona… (see more)l template-based registration of the spinal cord uses intervertebral discs for alignment. However, substantial anatomical variability across individuals exists between vertebral and spinal levels. This study proposes a novel registration approach that leverages spinal nerve rootlets to improve alignment accuracy and reproducibility across individuals. We developed a registration method leveraging dorsal cervical rootlets segmentation and aligning them non-linearly with the PAM50 spinal cord template. Validation was performed on a multi-subject, multi-site dataset (n = 267, 44 sites) and a multi-subject dataset with various neck positions (n = 10, 3 sessions). We further validated the method on task-based functional MRI (n = 23) to compare group-level activation maps using rootlet-based registration to traditional disc-based methods. Rootlet-based registration showed superior alignment across individuals compared with the traditional disc-based method on n = 226 individuals, and on n = 176 individuals for morphological analyses. Notably, rootlet positions were more stable across neck positions. Group-level analysis of task-based functional MRI using rootlet-based registration increased Z scores and activation cluster size compared with disc-based registration (number of active voxels from 3292 to 7978). Rootlet-based registration enhances both inter- and intra-subject anatomical alignment and yields better spatial normalization for group-level fMRI analyses. Our findings highlight the potential of rootlet-based registration to improve the precision and reliability of spinal cord neuroimaging group analysis.
Amortized Sampling with Transferable Normalizing Flows
Charlie B. Tan
Leon Klein
Saifuddin Syed
Michael M. Bronstein
Efficient equilibrium sampling of molecular conformations remains a core challenge in computational chemistry and statistical inference. Cla… (see more)ssical approaches such as molecular dynamics or Markov chain Monte Carlo inherently lack amortization; the computational cost of sampling must be paid in-full for each system of interest. The widespread success of generative models has inspired interest into overcoming this limitation through learning sampling algorithms. Despite performing on par with conventional methods when trained on a single system, learned samplers have so far demonstrated limited ability to transfer across systems. We prove that deep learning enables the design of scalable and transferable samplers by introducing Prose, a 280 million parameter all-atom transferable normalizing flow trained on a corpus of peptide molecular dynamics trajectories up to 8 residues in length. Prose draws zero-shot uncorrelated proposal samples for arbitrary peptide systems, achieving the previously intractable transferability across sequence length, whilst retaining the efficient likelihood evaluation of normalizing flows. Through extensive empirical evaluation we demonstrate the efficacy of Prose as a proposal for a variety of sampling algorithms, finding a simple importance sampling-based finetuning procedure to achieve superior performance to established methods such as sequential Monte Carlo on unseen tetrapeptides. We open-source the Prose codebase, model weights, and training dataset, to further stimulate research into amortized sampling methods and finetuning objectives.
Uncovering executive function profiles within interindividual variability: A data driven clustering exploration of design fluency in school-aged children
Myriam Sahraoui
Vanessa Hadid
Bruno Gauthier
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
David Vazquez
Juan A. Rodriguez
Sai Rajeswar
ServiceNow
WebMMU Benchmark
We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing inv… (see more)olving HTML/CSS/JavaScript, and (3) mockup-to-code generation. Unlike prior benchmarks that treat these tasks separately, WebMMU unifies them using expert-annotated, real-world web data to assess models'abilities in complex multi-step reasoning, precise element grounding, and functional UI comprehension and coding. Our evaluation shows that while multimodal large language models (MLLMs) perform well on basic information extraction, they struggle with reasoning and grounding, editing code to preserve functionality, and generating design-to-code that maintains hierarchy and supports multilingual content. These findings reveal key limitations in current MLLMs and underscore the need for improved multimodal and cross-lingual reasoning to build future web agents capable of automating diverse web development tasks.
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
David Vazquez
Juan A. Rodriguez
Sai Rajeswar
ServiceNow
WebMMU Benchmark
We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing inv… (see more)olving HTML/CSS/JavaScript, and (3) mockup-to-code generation. Unlike prior benchmarks that treat these tasks separately, WebMMU unifies them using expert-annotated, real-world web data to assess models'abilities in complex multi-step reasoning, precise element grounding, and functional UI comprehension and coding. Our evaluation shows that while multimodal large language models (MLLMs) perform well on basic information extraction, they struggle with reasoning and grounding, editing code to preserve functionality, and generating design-to-code that maintains hierarchy and supports multilingual content. These findings reveal key limitations in current MLLMs and underscore the need for improved multimodal and cross-lingual reasoning to build future web agents capable of automating diverse web development tasks.
Communication Efficient LLM Pre-training with SparseLoCo
Amir M. Sarfi
Joel Lidin
Low-dimensional embeddings of high-dimensional data
Cyril de Bodt
Alex Diaz-Papkovich
Michael Bleher
Kerstin Bunte
Corinna Coupette
Sebastian Damrich
Fred A. Hamprecht
EmHoke-'Agnes Horv'at
Dhruv Kohli
John A. Lee 0001
Boudewijn P. F. Lelieveldt
Leland McInnes
Ian T. Nabney
Maximilian Noichl
Pavlin G. Polivcar
Bastian Rieck
Gal Mishne … (see 1 more)
Dmitry Kobak
Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (see more)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.
Low-dimensional embeddings of high-dimensional data
Cyril de Bodt
Alex Diaz-Papkovich
Michael Bleher
Kerstin Bunte
Corinna Coupette
Sebastian Damrich
Fred Hamprecht
EmHoke-'Agnes Horv'at
Dhruv Kohli
John A. Lee 0001
Boudewijn P. F. Lelieveldt
Leland McInnes
Ian T. Nabney
Maximilian Noichl
Pavlin G. Polivcar
Bastian Rieck
Gal Mishne … (see 1 more)
Dmitry Kobak
Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (see more)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Clara Meister
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Däubener
Sophie Fellenz
Thomas Gärtner
Matthias Kirchler
Marius Kloft
Yingzhen Li
Christoph Lippert
Gerard de Melo
Eric Nalisnick
Björn Ommer
Rajesh Ranganath
Maja Rudolph
Karen Ullrich
Guy Van den Broeck … (see 6 more)
Julia E Vogt
Yixin Wang
Florian Wenzel
Frank N. Wood
Stephan Mandt
Vincent Fortuin
Robustness of Neural Ratio and Posterior Estimators to Distributional Shifts for Population-Level Dark Matter Analysis in Strong Gravitational Lensing