Publications

Assessing the exposure of buildings to long-term sea level rise across the Global South
M. Willard-Stepan
N. Gomez
E. D. Galbraith
E. M. Bennett
Distributed Combined Space Partitioning and Network Flow Optimization: an Optimal Transport Approach (Extended Version)
Th'eo Laurentin
Patrick Coirault
Emmanuel Moulay
J'erome Le Ny
Rootlets-based registration to the PAM50 spinal cord template
Sandrine Bédard
Valeria Oliva
Kenneth A. Weber
Abstract Spinal cord functional MRI studies require precise localization of spinal levels for reliable voxel-wise group analyses. Traditiona… (voir plus)l template-based registration of the spinal cord uses intervertebral discs for alignment. However, substantial anatomical variability across individuals exists between vertebral and spinal levels. This study proposes a novel registration approach that leverages spinal nerve rootlets to improve alignment accuracy and reproducibility across individuals. We developed a registration method leveraging dorsal cervical rootlets segmentation and aligning them non-linearly with the PAM50 spinal cord template. Validation was performed on a multi-subject, multi-site dataset (n = 267, 44 sites) and a multi-subject dataset with various neck positions (n = 10, 3 sessions). We further validated the method on task-based functional MRI (n = 23) to compare group-level activation maps using rootlet-based registration to traditional disc-based methods. Rootlet-based registration showed superior alignment across individuals compared with the traditional disc-based method on n = 226 individuals, and on n = 176 individuals for morphological analyses. Notably, rootlet positions were more stable across neck positions. Group-level analysis of task-based functional MRI using rootlet-based registration increased Z scores and activation cluster size compared with disc-based registration (number of active voxels from 3292 to 7978). Rootlet-based registration enhances both inter- and intra-subject anatomical alignment and yields better spatial normalization for group-level fMRI analyses. Our findings highlight the potential of rootlet-based registration to improve the precision and reliability of spinal cord neuroimaging group analysis.
Amortized Sampling with Transferable Normalizing Flows
Charlie B. Tan
Leon Klein
Saifuddin Syed
Michael M. Bronstein
Alexander Tong
Efficient equilibrium sampling of molecular conformations remains a core challenge in computational chemistry and statistical inference. Cla… (voir plus)ssical approaches such as molecular dynamics or Markov chain Monte Carlo inherently lack amortization; the computational cost of sampling must be paid in-full for each system of interest. The widespread success of generative models has inspired interest into overcoming this limitation through learning sampling algorithms. Despite performing on par with conventional methods when trained on a single system, learned samplers have so far demonstrated limited ability to transfer across systems. We prove that deep learning enables the design of scalable and transferable samplers by introducing Prose, a 280 million parameter all-atom transferable normalizing flow trained on a corpus of peptide molecular dynamics trajectories up to 8 residues in length. Prose draws zero-shot uncorrelated proposal samples for arbitrary peptide systems, achieving the previously intractable transferability across sequence length, whilst retaining the efficient likelihood evaluation of normalizing flows. Through extensive empirical evaluation we demonstrate the efficacy of Prose as a proposal for a variety of sampling algorithms, finding a simple importance sampling-based finetuning procedure to achieve superior performance to established methods such as sequential Monte Carlo on unseen tetrapeptides. We open-source the Prose codebase, model weights, and training dataset, to further stimulate research into amortized sampling methods and finetuning objectives.
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Mahsa Massoud
David Vazquez
Juan A. Rodriguez
Sai Rajeswar
ServiceNow
WebMMU Benchmark
We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing inv… (voir plus)olving HTML/CSS/JavaScript, and (3) mockup-to-code generation. Unlike prior benchmarks that treat these tasks separately, WebMMU unifies them using expert-annotated, real-world web data to assess models'abilities in complex multi-step reasoning, precise element grounding, and functional UI comprehension and coding. Our evaluation shows that while multimodal large language models (MLLMs) perform well on basic information extraction, they struggle with reasoning and grounding, editing code to preserve functionality, and generating design-to-code that maintains hierarchy and supports multilingual content. These findings reveal key limitations in current MLLMs and underscore the need for improved multimodal and cross-lingual reasoning to build future web agents capable of automating diverse web development tasks.
Communication Efficient LLM Pre-training with SparseLoCo
Amir M. Sarfi
Joel Lidin
Low-dimensional embeddings of high-dimensional data
Cyril de Bodt
Alex Diaz-Papkovich
Michael Bleher
Kerstin Bunte
Corinna Coupette
Sebastian Damrich
Fred A. Hamprecht
EmHoke-'Agnes Horv'at
Dhruv Kohli
John A. Lee 0001
Boudewijn P. F. Lelieveldt
Leland McInnes
Ian T. Nabney
Maximilian Noichl
Pavlin G. Polivcar
Bastian Rieck
Gal Mishne … (voir 1 de plus)
Dmitry Kobak
Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (voir plus)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.
Low-dimensional embeddings of high-dimensional data
Cyril de Bodt
Alex Diaz-Papkovich
Michael Bleher
Kerstin Bunte
Corinna Coupette
Sebastian Damrich
Fred A. Hamprecht
EmHoke-'Agnes Horv'at
Dhruv Kohli
John A. Lee 0001
Boudewijn P. F. Lelieveldt
Leland McInnes
Ian T. Nabney
Maximilian Noichl
Pavlin G. Polivcar
Bastian Rieck
Gal Mishne … (voir 1 de plus)
Dmitry Kobak
Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (voir plus)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Clara Meister
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Däubener
Sophie Fellenz
Asja Fischer
Thomas Gärtner
Matthias Kirchler
Marius Kloft
Yingzhen Li
Christoph Lippert
Gerard de Melo
Eric Nalisnick
Björn Ommer
Rajesh Ranganath
Maja Rudolph
Karen Ullrich
Guy Van den Broeck … (voir 6 de plus)
Julia E Vogt
Yixin Wang
Florian Wenzel
Frank N. Wood
Stephan Mandt
Vincent Fortuin
Robustness of Neural Ratio and Posterior Estimators to Distributional Shifts for Population-Level Dark Matter Analysis in Strong Gravitational Lensing
Development of a defacing algorithm to protect the privacy of head and neck cancer patients in publicly-accessible radiotherapy datasets
Kayla O'Sullivan‐Steben
Luc Galarneau
Pixels Under Pressure: Exploring Fine-Tuning Paradigms for Foundation Models in High-Resolution Medical Imaging
Zahra Tehrani Nasab
Advancements in diffusion-based foundation models have improved text-to-image generation, yet most efforts have been limited to low-resoluti… (voir plus)on settings. As high-resolution image synthesis becomes increasingly essential for various applications, particularly in medical imaging domains, fine-tuning emerges as a crucial mechanism for adapting these powerful pre-trained models to task-specific requirements and data distributions. In this work, we present a systematic study, examining the impact of various fine-tuning techniques on image generation quality when scaling to high resolution 512x512 pixels. We benchmark a diverse set of fine-tuning methods, including full fine-tuning strategies and parameter-efficient fine-tuning (PEFT). We dissect how different fine-tuning methods influence key quality metrics, including Fr\'echet Inception Distance (FID), Vendi score, and prompt-image alignment. We also evaluate the utility of generated images in a downstream classification task under data-scarce conditions, demonstrating that specific fine-tuning strategies improve both generation fidelity and downstream performance when synthetic images are used for classifier training and evaluation on real images. Our code is accessible through the project website - https://tehraninasab.github.io/PixelUPressure/.