Publications

Using Image-based AI for insect monitoring and conservation - InsectAI COST Action

Tom August

Mario Balzan

Paul Bodesheim

Gunnar Brehm

Lisette Cantú-Salazar

Sílvia Castro

Joseph Chipperfield

Guillaume Ghisbain

Alba Gomez-Segura

Jérémie Goulnik

Quentin Groom

Laurens Hogeweg

Chantal Huijbers

Andreas Kamilaris

Karolis Kazlauskis

Wouter Koch

Dimitri Korsch

João Loureiro

Youri Martin

Angeliki Martinou … (see 27 more)

Kent McFarland

Xavier Mestdagh

Denis Michez

Charlie Outhwaite

Luca Pegoraro

Nadja Pernat

Lars Pettersson

Pavel Pipek

Cristina Preda

David Rolnick

Tobias Roth

David Roy

Helen Roy

Veljo Runnel

Martina Sasic

Dmitry Schigel

Julie Sheard

Cecilie Svenningsen

Heliana Teixeira

Nicolas Titeux

Thomas Tscheulin

Elli Tzirkalli

Marijn van der Velde

Roel van Klink

Nicolas Vereecken

Sarah Vray

Toke Thomas Høye

2025-02-10

Research Ideas and Outcomes (published)

doi.org

RadiSeq: a single- and bulk-cell whole-genome DNA sequencing simulator for radiation-damaged cell models

Felix Mathew

Luc Galarneau

John Kildea

Objective To build and validate a simulation framework to perform single-cell and bulk-cell whole genome sequencing simulation of radiation-… (see more)exposed Monte Carlo cell models to assist radiation genomics studies. Approach Sequencing the genomes of radiation-damaged cells can provide useful insight into radiation action for radiobiology research. However, carrying out post-irradiation sequencing experiments can often be challenging, expensive, and time-consuming. Although computational simulations have the potential to provide solutions to these experimental challenges, and aid in designing optimal experiments, the absence of tools currently limits such application. Monte Carlo toolkits exist to simulate radiation exposures of cell models but there are no tools to simulate single- and bulk-cell sequencing of cell models containing radiation-damaged DNA. Therefore, we aimed to develop a Monte Carlo simulation framework to address this gap by designing a tool capable of simulating sequencing processes for radiation-damaged cells. Main Results We developed RadiSeq – a multi-threaded whole-genome DNA sequencing simulator written in C++. RadiSeq can be used to simulate Illumina sequencing of radiation-damaged cell models produced by Monte Carlo simulations. RadiSeq has been validated through comparative analysis, where simulated data were matched against experimentally obtained data, demonstrating reasonable agreement between the two. Additionally, it comes with numerous features designed to closely resemble actual whole-genome sequencing. RadiSeq is also highly customizable with a single input parameter file. Significance RadiSeq enables the research community to perform complex simulations of radiation-exposed DNA sequencing, supporting the optimization, planning, and validation of costly and time-intensive radiation biology experiments. This framework provides a powerful tool for advancing radiation genomics research.

2025-02-09

bioRxiv (preprint)

doi.org

Mol-MoE: Training Preference-Guided Routers for Molecule Generation

Diego Calanzone

Pierluca D'Oro

Pierre-Luc Bacon

Recent advances in language models have enabled framing molecule generation as sequence modeling. However, existing approaches often rely on… (see more) single-objective reinforcement learning, limiting their applicability to real-world drug design, where multiple competing properties must be optimized. Traditional multi-objective reinforcement learning (MORL) methods require costly retraining for each new objective combination, making rapid exploration of trade-offs impractical. To overcome these limitations, we introduce Mol-MoE, a mixture-of-experts (MoE) architecture that enables efficient test-time steering of molecule generation without retraining. Central to our approach is a preference-based router training objective that incentivizes the router to combine experts in a way that aligns with user-specified trade-offs. This provides improved flexibility in exploring the chemical property space at test time, facilitating rapid trade-off exploration. Benchmarking against state-of-the-art methods, we show that Mol-MoE achieves superior sample quality and steerability.

2025-02-08

ArXiv (preprint)

arxiv.org

Improving Patient Safety Culture in Conflict-Affected Zones: A Cross-Sectional Survey of North Kivu Surgical Personnel in the Democratic Republic of the Congo.

Jacques Fadhili Bake

Claude Kasereka Masumbuko

Zacharie Tsongo Kibendelwa

Georges Bushu Lubuto

Jean‐Claude Mafuta Kyembwa

Esaie Kasereka Nzala

Papy Waleyirwe Kakule

Clovis Bwami Akumbi

Jean Zanga Kitutu

Tresor Basubi Wakilongo

Theophile Kubuya Hangi

Wilson Katembo Kwiraviwe

Benjamin Musemakweli

Beate Tshikudju Bahati

Steve Kisembo Bakabona

Dan Poenaru

BACKGROUND Patient safety culture significantly impacts outcomes in surgery, where preventable errors can occur. This study assessed patient… (see more) safety culture and its determinants in operating rooms across North Kivu, a conflict-affected province in the eastern Democratic Republic of the Congo (DRC). METHODS A descriptive multicenter cross-sectional study was conducted from July to September 2024 in five urban and six rural hospitals. The French version of the Hospital Survey on Patient Safety Culture (HSOPSC) questionnaire was administered to 328 operating room healthcare professionals. RESULTS The response rate was 78% (256 completed surveys). Urban hospitals accounted for 55.5% of respondents, who were 73.4% male and 62.5% under the age of 40. The overall composite score for patient safety culture was 63.2%. Teamwork (81.1%) and management support for patient safety (77.7%) received the highest positive responses, whereas error reporting (39.9%) and patient safety event reporting (50%) scored lower. Half (49.6%) of the respondents rated patient safety as excellent or very good. There were no significant differences in overall mean composite scores between urban and rural hospitals (p = 0.677) and between medical and paramedical staff (p = 0.694). CONCLUSIONS The patient safety culture rating in North Kivu falls below international standards, highlighting an urgent need for improvement, particularly in error response and event reporting. Developing a tailored patient safety bundle for the region is essential to enhance overall health outcomes.

2025-02-07

World Journal of Surgery (published)

doi.org

Agency Is Frame-Dependent

David Abel

Andre Barreto

Michael Bowling

Will Dabney

Shi Dong

Steven Hansen

Anna Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science… (see more), and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.

2025-02-06

ArXiv (preprint)

doi.org

arxiv.org

Agency Is Frame-Dependent

David Abel

Andre Barreto

Michael Bowling

Will Dabney

Shi Dong

Steven Hansen

A. Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science… (see more), and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.

2025-02-06

ArXiv (preprint)

arxiv.org

Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free

Gian Mario Favero

Parham Saremi

Emily Kaczmarek

Brennan Nichyporuk

Tal Arbel

Discriminative classifiers have become a foundational tool in deep learning for medical imaging, excelling at learning separable features of… (see more) complex data distributions. However, these models often need careful design, augmentation, and training techniques to ensure safe and reliable deployment. Recently, diffusion models have become synonymous with generative modeling in 2D. These models showcase robustness across a range of tasks including natural image classification, where classification is performed by comparing reconstruction errors across images generated for each possible conditioning input. This work presents the first exploration of the potential of class conditional diffusion models for 2D medical image classification. First, we develop a novel majority voting scheme shown to improve the performance of medical diffusion classifiers. Next, extensive experiments on the CheXpert and ISIC Melanoma skin cancer datasets demonstrate that foundation and trained-from-scratch diffusion models achieve competitive performance against SOTA discriminative classifiers without the need for explicit supervision. In addition, we show that diffusion classifiers are intrinsically explainable, and can be used to quantify the uncertainty of their predictions, increasing their trustworthiness and reliability in safety-critical, clinical contexts. Further information is available on our project page: https://faverogian.github.io/med-diffusion-classifier.github.io/

2025-02-06

ArXiv (preprint)

arxiv.org

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

Luca Della Libera

Francesco Paissan

Cem Subakan

Mirco Ravanelli

Large language models have revolutionized natural language processing through self-supervised pretraining on massive datasets. Inspired by t… (see more)his success, researchers have explored adapting these methods to speech by discretizing continuous audio into tokens using neural audio codecs. However, existing approaches face limitations, including high bitrates, the loss of either semantic or acoustic information, and the reliance on multi-codebook designs when trying to capture both, which increases architectural complexity for downstream tasks. To address these challenges, we introduce FocalCodec, an efficient low-bitrate codec based on focal modulation that utilizes a single binary codebook to compress speech between 0.16 and 0.65 kbps. FocalCodec delivers competitive performance in speech resynthesis and voice conversion at lower bitrates than the current state-of-the-art, while effectively handling multilingual speech and noisy environments. Evaluation on downstream tasks shows that FocalCodec successfully preserves sufficient semantic and acoustic information, while also being well-suited for generative modeling. Demo samples, code and checkpoints are available at https://lucadellalib.github.io/focalcodec-web/.

2025-02-06

ArXiv (preprint)

doi.org

arxiv.org

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

Luca Della Libera

Francesco Paissan

Cem Subakan

Mirco Ravanelli

Large language models have revolutionized natural language processing through self-supervised pretraining on massive datasets. Inspired by t… (see more)his success, researchers have explored adapting these methods to speech by discretizing continuous audio into tokens using neural audio codecs. However, existing approaches face limitations, including high bitrates, the loss of either semantic or acoustic information, and the reliance on multi-codebook designs when trying to capture both, which increases architectural complexity for downstream tasks. To address these challenges, we introduce FocalCodec, an efficient low-bitrate codec based on focal modulation that utilizes a single binary codebook to compress speech between 0.16 and 0.65 kbps. FocalCodec delivers competitive performance in speech resynthesis and voice conversion at lower bitrates than the current state-of-the-art, while effectively handling multilingual speech and noisy environments. Evaluation on downstream tasks shows that FocalCodec successfully preserves sufficient semantic and acoustic information, while also being well-suited for generative modeling. Demo samples, code and checkpoints are available at https://lucadellalib.github.io/focalcodec-web/.

2025-02-06

ArXiv (preprint)

arxiv.org

Principal Curvatures Estimation with Applications to Single Cell Data

Yanlei Zhang

Lydia Mezrag

Xingzhi Sun

Charles Xu

Kincaid MacDonald

Dhananjay Bhaskar

Smita Krishnaswamy

Guy Wolf

Bastian Rieck

The rapidly growing field of single-cell transcriptomic sequencing (scRNAseq) presents challenges for data analysis due to its massive datas… (see more)ets. A common method in manifold learning consists in hypothesizing that datasets lie on a lower dimensional manifold. This allows to study the geometry of point clouds by extracting meaningful descriptors like curvature. In this work, we will present Adaptive Local PCA (AdaL-PCA), a data-driven method for accurately estimating various notions of intrinsic curvature on data manifolds, in particular principal curvatures for surfaces. The model relies on local PCA to estimate the tangent spaces. The evaluation of AdaL-PCA on sampled surfaces shows state-of-the-art results. Combined with a PHATE embedding, the model applied to single-cell RNA sequencing data allows us to identify key variations in the cellular differentiation.

2025-02-06

ArXiv (preprint)

arxiv.org

Principal Curvatures Estimation with Applications to Single Cell Data

Yanlei Zhang

Lydia Mezrag

Xingzhi Sun

Charles Xu

Kincaid MacDonald

Dhananjay Bhaskar

Smita Krishnaswamy

Guy Wolf

Bastian Rieck

2025-02-06

ArXiv (preprint)

doi.org

arxiv.org

Tackling the Problem of Distributional Shifts: Correcting Misspecified, High-Dimensional Data-Driven Priors for Inverse Problems

Gabriel Missael Barco

Alexandre Adam

Connor Stone

Yashar Hezaveh

Laurence Perreault-Levasseur

Bayesian inference for inverse problems hinges critically on the choice of priors. In the absence of specific prior information, population-… (see more)level distributions can serve as effective priors for parameters of interest. With the advent of machine learning, the use of data-driven population-level distributions (encoded, e.g., in a trained deep neural network) as priors is emerging as an appealing alternative to simple parametric priors in a variety of inverse problems. However, in many astrophysical applications, it is often difficult or even impossible to acquire independent and identically distributed samples from the underlying data-generating process of interest to train these models. In these cases, corrupted data or a surrogate, e.g. a simulator, is often used to produce training samples, meaning that there is a risk of obtaining misspecified priors. This, in turn, can bias the inferred posteriors in ways that are difficult to quantify, which limits the potential applicability of these models in real-world scenarios. In this work, we propose addressing this issue by iteratively updating the population-level distributions by retraining the model with posterior samples from different sets of observations and showcase the potential of this method on the problem of background image reconstruction in strong gravitational lensing when score-based models are used as data-driven priors. We show that starting from a misspecified prior distribution, the updated distribution becomes progressively closer to the underlying population-level distribution, and the resulting posterior samples exhibit reduced bias after several updates.

2025-02-06

The Astrophysical Journal (published)

doi.org

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications