Publications

Low-dimensional embeddings of high-dimensional data

Cyril de Bodt

Alex Diaz-Papkovich

Michael Bleher

Kerstin Bunte

Corinna Coupette

Sebastian Damrich

Enrique Fita Sanmartin

Fred A. Hamprecht

EmHoke-'Agnes Horv'at

Dhruv Kohli

Smita Krishnaswamy

John A. Lee 0001

Boudewijn P. F. Lelieveldt

Leland McInnes

Ian T. Nabney

Maximilian Noichl

Pavlin G. Polivcar

Bastian Rieck

Guy Wolf

Gal Mishne … (see 1 more)

Dmitry Kobak

Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (see more)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.

2025-08-21

ArXiv (preprint)

arxiv.org

On the Challenges and Opportunities in Generative AI

Laura Manduchi

Clara Meister

Kushagra Pandey

Robert Bamler

Ryan Cotterell

Sina Däubener

Sophie Fellenz

Asja Fischer

Thomas Gärtner

Matthias Kirchler

Marius Kloft

Yingzhen Li

Christoph Lippert

Gerard de Melo

Eric Nalisnick

Björn Ommer

Rajesh Ranganath

Maja Rudolph

Karen Ullrich

Guy Van den Broeck … (see 6 more)

Julia E Vogt

Yixin Wang

Florian Wenzel

Frank N. Wood

Stephan Mandt

Vincent Fortuin

2025-08-21

TMLR (accepted)

doi.org

openreview.net

Robustness of Neural Ratio and Posterior Estimators to Distributional Shifts for Population-Level Dark Matter Analysis in Strong Gravitational Lensing

Andreas Filipp

Yashar Hezaveh

Laurence Perreault-Levasseur

2025-08-20

The Astrophysical Journal (published)

doi.org

arxiv.org

Development of a defacing algorithm to protect the privacy of head and neck cancer patients in publicly-accessible radiotherapy datasets

Kayla O'Sullivan‐Steben

Luc Galarneau

John Kildea

2025-08-19

ArXiv (preprint)

arxiv.org

Massive Extremely High-Velocity Outflow in the Quasar J164653.72+243942.2

Paola Rodr'iguez Hidalgo

Hyunseop Choi

Patrick B. Hall

K. Leighly

Liliana Flores

Mikel M. Charles

Cora DeFrancesco

J. Hlavacek-Larrondo

Laurence Perreault-Levasseur

2025-08-19

ArXiv (preprint)

arxiv.org

Pixels Under Pressure: Exploring Fine-Tuning Paradigms for Foundation Models in High-Resolution Medical Imaging

Zahra TehraniNasab

Amar Kumar

Tal Arbel

Advancements in diffusion-based foundation models have improved text-to-image generation, yet most efforts have been limited to low-resoluti… (see more)on settings. As high-resolution image synthesis becomes increasingly essential for various applications, particularly in medical imaging domains, fine-tuning emerges as a crucial mechanism for adapting these powerful pre-trained models to task-specific requirements and data distributions. In this work, we present a systematic study, examining the impact of various fine-tuning techniques on image generation quality when scaling to high resolution 512x512 pixels. We benchmark a diverse set of fine-tuning methods, including full fine-tuning strategies and parameter-efficient fine-tuning (PEFT). We dissect how different fine-tuning methods influence key quality metrics, including Fr\'echet Inception Distance (FID), Vendi score, and prompt-image alignment. We also evaluate the utility of generated images in a downstream classification task under data-scarce conditions, demonstrating that specific fine-tuning strategies improve both generation fidelity and downstream performance when synthetic images are used for classifier training and evaluation on real images. Our code is accessible through the project website - https://tehraninasab.github.io/PixelUPressure/.

2025-08-19

ArXiv (preprint)

arxiv.org

Field-level Comparison and Robustness Analysis of Cosmological <i>N</i>-body Simulations

Adrian E. Bayer

Francisco Villaescusa-navarro

Sammy Sharief

Romain Teyssier

Lehman H. Garrison

Laurence Perreault-Levasseur

Greg L. Bryan

Marco Gatti

Eli Visbal

2025-08-18

The Astrophysical Journal (published)

doi.org

Proceedings of the OHBM Open Science Room 2024

Selma Lugtmeijer

Ju-Chi Yu

Xiangzhen Kong

Lune Bellec

Janine D. Bijsterbosch

Elizabeth DuPre

Oscar Esteban

Ibrahim Faye

Seok-Jun Hong

Chuan-Peng Hu

Shella Keilholz

Chun-Chia Kung

Hyeong Hun Lee

Daniel Margulies

Cyril Pernet

Franco Pestilli

Jean-Baptiste Poline

Pradeep R. Raamana

Francesco Santini

Won Mok Shim … (see 30 more)

Paul M. Thompson

Chao-Gan Yan

Niall W. Duncan

Nikhil Bhagwat

Peter Fox

Ana Van Gulick

David N. Kennedy

Gorana Pobric

Neda Sadeghi

Nick Souter

Sandeep Panta

Isabelle van der Velpen

Tonya White

Sina Mansour L.

Qing Wang

Povilas Karvelis

Anibal S. Heinsfeld

Yu-Fang Yang

Hong Ji Kim

Nur Shahidatul Nabila Binti Ibrahim

Stefano Moia

Wei Zhang

Jessica Haigh

Rose-Marie Kouwenhoven

Terra Hyun Lee

Hurshitha Vasudevan

Yuping Yang

Subapriya Suppiah

Yi-Ju Lee

Nils Muhlert

2025-08-18

Aperture Neuro (published)

doi.org

MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training

Muhammad Osama Zeeshan

Natacha Gillet

Alessandro Lameiras Koerich

Marco Pedersoli

Francois Bremond

Eric Granger

2025-08-17

ArXiv (preprint)

arxiv.org

Increasing the Utility of Synthetic Images through Chamfer Guidance

Nicola Dall'Asen

Xiaofeng Zhang

Reyhane Askari Hemmat

Melissa Hall

Jakob Verbeek

Adriana Romero Soriano

Michal Drozdzal

Conditional image generative models hold considerable promise to produce infinite amounts of synthetic training data. Yet, recent progress i… (see more)n generation quality has come at the expense of generation diversity, limiting the utility of these models as a source of synthetic training data. Although guidance-based approaches have been introduced to improve the utility of generated data by focusing on quality or diversity, the (implicit or explicit) utility functions oftentimes disregard the potential distribution shift between synthetic and real data. In this work, we introduce Chamfer Guidance: a training-free guidance approach which leverages a handful of real exemplar images to characterize the quality and diversity of synthetic data. We show that by leveraging the proposed Chamfer Guidance, we can boost the diversity of the generations w.r.t. a dataset of real images while maintaining or improving the generation quality on ImageNet-1k and standard geo-diversity benchmarks. Our approach achieves state-of-the-art few-shot performance with as little as 2 exemplar real images, obtaining 96.4\% in terms of precision, and 86.4\% in terms of distributional coverage, which increase to 97.5\% and 92.7\%, respectively, when using 32 real images. We showcase the benefits of the Chamfer Guidance generation by training downstream image classifiers on synthetic data, achieving accuracy boost of up to 15\% for in-distribution over the baselines, and up to 16\% in out-of-distribution. Furthermore, our approach does not require using the unconditional model, and thus obtains a 31\% reduction in FLOPs w.r.t. classifier-free-guidance-based approaches at sampling time.

2025-08-14

ArXiv (preprint)

arxiv.org

Street Review: A Participatory AI-Based Framework for Assessing Streetscape Inclusivity

Rashid A. Mushkani

Shin (Alexandre) Koseki

2025-08-14

ArXiv (preprint)

arxiv.org

The Interpolation Constraint in the RV Analysis of M-Dwarfs Using Empirical Templates

Dhvani Doshi

Nicolas B. Cowan

'Etienne Artigau

René Doyon

Andr'e M. Silva

K. A. Moulla

Yashar Hezaveh

Precise radial velocity (pRV) measurements of M-dwarfs in the near-infrared (NIR) rely on empirical templates due to the lack of accurate st… (see more)ellar spectral models in this regime. Templates are assumed to approximate the true spectrum when constructed from many observations or in the high signal-to-noise limit. We develop a numerical simulation that generates SPIRou-like pRV observations from PHOENIX spectra, constructs empirical templates, and estimates radial velocities. This simulation solely considers photon noise and evaluates when empirical templates remain reliable for pRV analysis. Our results reveal a previously unrecognized noise source in templates, establishing a fundamental floor for template-based pRV measurements. We find that templates inherently include distortions in stellar line shapes due to imperfect interpolation at the detector's sampling resolution. The magnitude of this interpolation error depends on sampling resolution and RV content. Consequently, while stars with a higher RV content, such as cooler M-dwarfs are expected to yield lower RV uncertainties, their dense spectral features can amplify interpolation errors, potentially biasing RV estimates. For a typical M4V star, SPIRou's spectral and sampling resolution imposes an RV uncertainty floor of 0.5-0.8 m/s, independent of the star's magnitude or the telescope's aperture. These findings reveal a limitation of template-based pRV methods, underscoring the need for improved spectral modeling and better-than-Nyquist detector sampling to reach the next level of RV precision.

2025-08-14

ArXiv (preprint)

arxiv.org

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

Hugo Larochelle appointed Scientific Director of Mila

Publications

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

Hugo Larochelle appointed Scientific Director of Mila

Popular keywords:

Publications