Publications

Low-dimensional embeddings of high-dimensional data
Cyril de Bodt
Alex Diaz-Papkovich
Michael Bleher
Kerstin Bunte
Corinna Coupette
Sebastian Damrich
Fred A. Hamprecht
EmHoke-'Agnes Horv'at
Dhruv Kohli
John A. Lee 0001
Boudewijn P. F. Lelieveldt
Leland McInnes
Ian T. Nabney
Maximilian Noichl
Pavlin G. Polivcar
Bastian Rieck
Gal Mishne … (see 1 more)
Dmitry Kobak
Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (see more)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Clara Meister
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Däubener
Sophie Fellenz
Asja Fischer
Thomas Gärtner
Matthias Kirchler
Marius Kloft
Yingzhen Li
Christoph Lippert
Gerard de Melo
Eric Nalisnick
Björn Ommer
Rajesh Ranganath
Maja Rudolph
Karen Ullrich
Guy Van den Broeck … (see 6 more)
Julia E Vogt
Yixin Wang
Florian Wenzel
Frank N. Wood
Stephan Mandt
Vincent Fortuin
Robustness of Neural Ratio and Posterior Estimators to Distributional Shifts for Population-Level Dark Matter Analysis in Strong Gravitational Lensing
Development of a defacing algorithm to protect the privacy of head and neck cancer patients in publicly-accessible radiotherapy datasets
Kayla O'Sullivan‐Steben
Luc Galarneau
Massive Extremely High-Velocity Outflow in the Quasar J164653.72+243942.2
Paola Rodr'iguez Hidalgo
Hyunseop Choi
Patrick B. Hall
K. Leighly
Liliana Flores
Mikel M. Charles
Cora DeFrancesco
J. Hlavacek-Larrondo
Pixels Under Pressure: Exploring Fine-Tuning Paradigms for Foundation Models in High-Resolution Medical Imaging
Advancements in diffusion-based foundation models have improved text-to-image generation, yet most efforts have been limited to low-resoluti… (see more)on settings. As high-resolution image synthesis becomes increasingly essential for various applications, particularly in medical imaging domains, fine-tuning emerges as a crucial mechanism for adapting these powerful pre-trained models to task-specific requirements and data distributions. In this work, we present a systematic study, examining the impact of various fine-tuning techniques on image generation quality when scaling to high resolution 512x512 pixels. We benchmark a diverse set of fine-tuning methods, including full fine-tuning strategies and parameter-efficient fine-tuning (PEFT). We dissect how different fine-tuning methods influence key quality metrics, including Fr\'echet Inception Distance (FID), Vendi score, and prompt-image alignment. We also evaluate the utility of generated images in a downstream classification task under data-scarce conditions, demonstrating that specific fine-tuning strategies improve both generation fidelity and downstream performance when synthetic images are used for classifier training and evaluation on real images. Our code is accessible through the project website - https://tehraninasab.github.io/PixelUPressure/.
Field-level Comparison and Robustness Analysis of Cosmological <i>N</i>-body Simulations
Adrian E. Bayer
Francisco Villaescusa-navarro
Sammy Sharief
Romain Teyssier
Lehman H. Garrison
Greg L. Bryan
Marco Gatti
Eli Visbal
Proceedings of the OHBM Open Science Room 2024
Selma Lugtmeijer
Ju-Chi Yu
Xiangzhen Kong
Janine D. Bijsterbosch
Elizabeth DuPre
Oscar Esteban
Ibrahim Faye
Seok-Jun Hong
Chuan-Peng Hu
Shella Keilholz
Chun-Chia Kung
Hyeong Hun Lee
Daniel Margulies
Cyril Pernet
Franco Pestilli
Jean-Baptiste Poline
Pradeep R. Raamana
Francesco Santini
Won Mok Shim … (see 30 more)
Paul M. Thompson
Chao-Gan Yan
Niall W. Duncan
Nikhil Bhagwat
Peter Fox
Ana Van Gulick
David N. Kennedy
Gorana Pobric
Neda Sadeghi
Nick Souter
Sandeep Panta
Isabelle van der Velpen
Tonya White
Sina Mansour L.
Qing Wang
Povilas Karvelis
Anibal S. Heinsfeld
Yu-Fang Yang
Hong Ji Kim
Nur Shahidatul Nabila Binti Ibrahim
Stefano Moia
Wei Zhang
Jessica Haigh
Rose-Marie Kouwenhoven
Terra Hyun Lee
Hurshitha Vasudevan
Yuping Yang
Subapriya Suppiah
Yi-Ju Lee
Nils Muhlert
MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training
Muhammad Osama Zeeshan
Natacha Gillet
Alessandro Lameiras Koerich
Francois Bremond
Eric Granger
Increasing the Utility of Synthetic Images through Chamfer Guidance
Nicola Dall'Asen
Melissa Hall
Jakob Verbeek
Michal Drozdzal
Conditional image generative models hold considerable promise to produce infinite amounts of synthetic training data. Yet, recent progress i… (see more)n generation quality has come at the expense of generation diversity, limiting the utility of these models as a source of synthetic training data. Although guidance-based approaches have been introduced to improve the utility of generated data by focusing on quality or diversity, the (implicit or explicit) utility functions oftentimes disregard the potential distribution shift between synthetic and real data. In this work, we introduce Chamfer Guidance: a training-free guidance approach which leverages a handful of real exemplar images to characterize the quality and diversity of synthetic data. We show that by leveraging the proposed Chamfer Guidance, we can boost the diversity of the generations w.r.t. a dataset of real images while maintaining or improving the generation quality on ImageNet-1k and standard geo-diversity benchmarks. Our approach achieves state-of-the-art few-shot performance with as little as 2 exemplar real images, obtaining 96.4\% in terms of precision, and 86.4\% in terms of distributional coverage, which increase to 97.5\% and 92.7\%, respectively, when using 32 real images. We showcase the benefits of the Chamfer Guidance generation by training downstream image classifiers on synthetic data, achieving accuracy boost of up to 15\% for in-distribution over the baselines, and up to 16\% in out-of-distribution. Furthermore, our approach does not require using the unconditional model, and thus obtains a 31\% reduction in FLOPs w.r.t. classifier-free-guidance-based approaches at sampling time.
Street Review: A Participatory AI-Based Framework for Assessing Streetscape Inclusivity
Rashid A. Mushkani
The Interpolation Constraint in the RV Analysis of M-Dwarfs Using Empirical Templates
Nicolas B. Cowan
'Etienne Artigau
René Doyon
Andr'e M. Silva
K. A. Moulla
Precise radial velocity (pRV) measurements of M-dwarfs in the near-infrared (NIR) rely on empirical templates due to the lack of accurate st… (see more)ellar spectral models in this regime. Templates are assumed to approximate the true spectrum when constructed from many observations or in the high signal-to-noise limit. We develop a numerical simulation that generates SPIRou-like pRV observations from PHOENIX spectra, constructs empirical templates, and estimates radial velocities. This simulation solely considers photon noise and evaluates when empirical templates remain reliable for pRV analysis. Our results reveal a previously unrecognized noise source in templates, establishing a fundamental floor for template-based pRV measurements. We find that templates inherently include distortions in stellar line shapes due to imperfect interpolation at the detector's sampling resolution. The magnitude of this interpolation error depends on sampling resolution and RV content. Consequently, while stars with a higher RV content, such as cooler M-dwarfs are expected to yield lower RV uncertainties, their dense spectral features can amplify interpolation errors, potentially biasing RV estimates. For a typical M4V star, SPIRou's spectral and sampling resolution imposes an RV uncertainty floor of 0.5-0.8 m/s, independent of the star's magnitude or the telescope's aperture. These findings reveal a limitation of template-based pRV methods, underscoring the need for improved spectral modeling and better-than-Nyquist detector sampling to reach the next level of RV precision.