Publications

FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation

Neural audio codecs are a fundamental component of modern generative audio pipelines. Although recent codecs achieve strong low-bitrate reco… (voir plus)nstruction and provide powerful representations for downstream tasks, most are non-streamable, limiting their use in real-time applications. We present FocalCodec-Stream, a hybrid codec based on focal modulation that compresses speech into a single binary codebook at 0.55 - 0.80 kbps with a theoretical latency of 80 ms. Our approach combines multi-stage causal distillation of WavLM with targeted architectural improvements, including a lightweight refiner module that enhances quality under latency constraints. Experiments show that FocalCodec-Stream outperforms existing streamable codecs at comparable bitrates, while preserving both semantic and acoustic information. The result is a favorable trade-off between reconstruction quality, downstream task performance, latency, and efficiency. Code and checkpoints will be released at https://github.com/lucadellalib/focalcodec.

2025-09-01

arXiv (publié)

doi.org

arxiv.org

Identifying birdsong syllables without labelled data

Mélisande Teng

Julien Boussard

David Rolnick

Hugo Larochelle

Identifying sequences of syllables within birdsongs is key to tackling a wide array of challenges, including bird individual identification … (voir plus)and better understanding of animal communication and sensory-motor learning. Recently, machine learning approaches have demonstrated great potential to alleviate the need for experts to label long audio recordings by hand. However, they still typically rely on the availability of labelled data for model training, restricting applicability to a few species and datasets. In this work, we build the first fully unsupervised algorithm to decompose birdsong recordings into sequences of syllables. We first detect syllable events, then cluster them to extract templates -- syllable representations -- before performing matching pursuit to decompose the recording as a sequence of syllables. We evaluate our automatic annotations against human labels on a dataset of Bengalese finch songs and find that our unsupervised method achieves high performance. We also demonstrate that our approach can distinguish individual birds within a species through their unique vocal signatures, for both Bengalese finches and another species, the great tit.

2025-09-01

arXiv (publié)

doi.org

arxiv.org

Imagining Alternatives: Towards High-Resolution 3D Counterfactual Medical Image Generation via Language Guidance

Mohamed Mohamed

Brennan Nichyporuk

Douglas Arnold

Tal Arbel

Vision-language models have demonstrated impressive capabilities in generating 2D images under various conditions; however the impressive pe… (voir plus)rformance of these models in 2D is largely enabled by extensive, readily available pretrained foundation models. Critically, comparable pretrained foundation models do not exist for 3D, significantly limiting progress in this domain. As a result, the potential of vision-language models to produce high-resolution 3D counterfactual medical images conditioned solely on natural language descriptions remains completely unexplored. Addressing this gap would enable powerful clinical and research applications, such as personalized counterfactual explanations, simulation of disease progression scenarios, and enhanced medical training by visualizing hypothetical medical conditions in realistic detail. Our work takes a meaningful step toward addressing this challenge by introducing a framework capable of generating high-resolution 3D counterfactual medical images of synthesized patients guided by free-form language prompts. We adapt state-of-the-art 3D diffusion models with enhancements from Simple Diffusion and incorporate augmented conditioning to improve text alignment and image quality. To our knowledge, this represents the first demonstration of a language-guided native-3D diffusion model applied specifically to neurological imaging data, where faithful three-dimensional modeling is essential to represent the brain's three-dimensional structure. Through results on two distinct neurological MRI datasets, our framework successfully simulates varying counterfactual lesion loads in Multiple Sclerosis (MS), and cognitive states in Alzheimer's disease, generating high-quality images while preserving subject fidelity in synthetically generated medical images. Our results lay the groundwork for prompt-driven disease progression analysis within 3D medical imaging.

2025-09-01

arXiv (publié)

doi.org

arxiv.org

Metabolic Control and Frequency of Clinical Monitoring Among Canadian Children With Phenylalanine Hydroxylase Deficiency: A Retrospective Cohort Study

Nataliya Yuskiv

Ammar Saad

Beth K. Potter

Sylvia Stockler‐Ipsiroglu

John J. Mitchell

Steven Hawken

Kylie Tingley

Michael Pugliese

Monica Lamoureux

Andrea J. Chow

Jonathan B. Kronick

Kumanan Wilson

Annette Feigenbaum

Sharan Goobie

Michal Inbar-Feigenberg

Julian Little

Saadet Mercimek‐Andrews

Amy Pender

Chitra Prasad

Andreas Schulze … (voir 9 de plus)

Yannis Trakadis

Gloria Ho

Hilary Vallance

Valerie Austin

Anthony Vandersteen

Andrea C. Yu

Cheryl Rockman‐Greenberg

Aizeddin Mhanni

Pranesh Chakraborty

2025-09-01

JIMD Reports (publié)

doi.org

A Multimodal and Multi-centric Head and Neck Cancer Dataset for Segmentation, Diagnosis and Outcome Prediction

Numan Saeed

Salma Hassan

Shahad Hardan

Ahmed Aly

Darya Taratynova

Umair Nawaz

Ufaq Khan

Muhammad Ridzuan

Vincent Andrearczyk

Adrien Depeursinge

Yutong Xie

Thomas Eugene

Raphael Metz

Mélanie Dore

Gregory Delpon

Vijay Ram Kumar Papineni

Kareem Wahid

Cem Dede

Alaa Mohamed Shawky Ali

Carlos Sjogreen … (voir 23 de plus)

Mohamed Naser

Clifton D. Fuller

Valentin Oreiller

Mario Jreige

John O. Prior

Catherine Cheze Le Rest

Olena Tankyevych

Pierre Decazes

Su Ruan

Stephanie Tanadini-Lang

Martin Vallières

Hesham M. Elhalawani

Ronan Abgral

Romain Floch

Kevin Kerleguer

Ulrike Schick

Maelle Mauguen

David Bourhis

Jean-Christophe Leclere

Amandine Sambourg

Arman Rahmim

Mathieu Hatt

Mohammad Yaqub

We present a publicly available multimodal dataset for head and neck cancer research, comprising 1123 annotated Positron Emission Tomography… (voir plus)/Computed Tomography (PET/CT) studies from patients with histologically confirmed disease, acquired from 10 international medical centers. All studies contain co-registered PET/CT scans with varying acquisition protocols, reflecting real-world clinical diversity from a long-term, multi-institution retrospective collection. Primary gross tumor volumes (GTVp) and involved lymph nodes (GTVn) were manually segmented by experienced radiation oncologists and radiologists following established guidelines. We provide anonymized NifTi files, expert-annotated segmentation masks, comprehensive clinical metadata, and radiotherapy dose distributions for a patient subset. The metadata include TNM staging, HPV status, demographics, long-term follow-up outcomes, survival times, censoring indicators, and treatment information. To demonstrate its utility, we benchmark three key clinical tasks: automated tumor segmentation, recurrence-free survival prediction, and HPV status classification, using state-of-the-art deep learning models like UNet, SegResNet, and multimodal prognostic frameworks.

2025-09-01

arXiv (publié)

doi.org

OpenFake: An Open Dataset and Platform Toward Real-World Deepfake Detection

Akshatha Arodi

Jean-François Godbout

Reihaneh Rabbany

Deepfakes, synthetic media created using advanced AI techniques, pose a growing threat to information integrity, particularly in politically… (voir plus) sensitive contexts. This challenge is amplified by the increasing realism of modern generative models, which our human perception study confirms are often indistinguishable from real images. Yet, existing deepfake detection benchmarks rely on outdated generators or narrowly scoped datasets (e.g., single-face imagery), limiting their utility for real-world detection. To address these gaps, we present OpenFake, a large politically grounded dataset specifically crafted for benchmarking against modern generative models with high realism, and designed to remain extensible through an innovative crowdsourced adversarial platform that continually integrates new hard examples. OpenFake comprises nearly four million total images: three million real images paired with descriptive captions and almost one million synthetic counterparts from state-of-the-art proprietary and open-source models. Detectors trained on OpenFake achieve near-perfect in-distribution performance, strong generalization to unseen generators, and high accuracy on a curated in-the-wild social media test set, significantly outperforming models trained on existing datasets. Overall, we demonstrate that with high-quality and continually updated benchmarks, automatic deepfake detection is both feasible and effective in real-world settings.

2025-09-01

arXiv (publié)

doi.org

Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets

Recent advances in reasoning with large language models (LLMs) have demonstrated strong performance on complex mathematical tasks, including… (voir plus) combinatorial optimization. Techniques such as Chain-of-Thought and In-Context Learning have further enhanced this capability, making LLMs both powerful and accessible tools for a wide range of users, including non-experts. However, applying LLMs to matching problems, which require reasoning under preferential and structural constraints, remains underexplored. To address this gap, we introduce a novel benchmark of 369 instances of the College Admission Problem, a canonical example of a matching problem with preferences, to evaluate LLMs across key dimensions: feasibility, stability, and optimality. We employ this benchmark to assess the performance of several open-weight LLMs. Our results first reveal that while LLMs can satisfy certain constraints, they struggle to meet all evaluation criteria consistently. They also show that reasoning LLMs, like QwQ and GPT-oss, significantly outperform traditional models such as Llama, Qwen or Mistral, defined here as models used without any dedicated reasoning mechanisms. Moreover, we observed that LLMs reacted differently to the various prompting strategies tested, which include Chain-of-Thought, In-Context Learning and role-based prompting, with no prompt consistently offering the best performance. Finally, we report the performances from iterative prompting with auto-generated feedback and show that they are not monotonic; they can peak early and then significantly decline in later attempts. Overall, this work offers a new perspective on model reasoning performance and the effectiveness of prompting strategies in combinatorial optimization problems with preferential constraints.

2025-09-01

arXiv (publié)

doi.org

Relative Trajectory Balance is equivalent to Trust-PCL

Recent progress in generative modeling has highlighted the importance of Reinforcement Learning (RL) for fine-tuning, with KL-regularized me… (voir plus)thods in particular proving to be highly effective for both autoregressive and diffusion models. Complementing this line of work, the Relative Trajectory Balance (RTB) objective was recently introduced in the context of Generative Flow Networks (GFlowNets) to serve the same role of improving fine-tuning in sequential generative models. Building on prior work linking GFlowNets and maximum-entropy RL, we establish in this paper an equivalence between RTB and Trust-PCL, an off-policy RL method with KL regularization. This equivalence situates RTB within the broader theoretical landscape of KL-regularized RL, and clarifies its relationship to earlier methods. Leveraging this insight, we revisit an illustrative example from the RTB paper and show that KL-regularized RL methods achieve comparable performance, offering an alternative perspective to what was previously reported.

2025-09-01

ArXiv (prépublication)

doi.org

arxiv.org

Relative Trajectory Balance is equivalent to Trust-PCL

2025-09-01

ArXiv (prépublication)

doi.org

arxiv.org

Relative Trajectory Balance is equivalent to Trust-PCL

Recent progress in generative modeling has highlighted the importance of Reinforcement Learning (RL) for fine-tuning, with KL-regularized me… (voir plus)thods in particular proving to be highly effective for both autoregressive and diffusion models. Complementing this line of work, the Relative Trajectory Balance (RTB) objective was recently introduced in the context of Generative Flow Networks (GFlowNets) to serve the same role of improving fine-tuning in sequential generative models. Building on prior work linking GFlowNets and maximum-entropy RL, we establish in this paper an equivalence between RTB and Trust-PCL, an off-policy RL method with KL regularization. This equivalence situates RTB within the broader theoretical landscape of KL-regularized RL, and clarifies its relationship to earlier methods. Leveraging this insight, we revisit an illustrative example from the RTB paper and show that KL-regularized RL methods achieve comparable performance, offering an alternative perspective to what was previously reported.

2025-09-01

arXiv (publié)

doi.org

SSL-AD: Spatiotemporal Self-Supervised Learning for Generalizability and Adaptability Across Alzheimer's Prediction Tasks and Datasets

Alzheimer's disease is a progressive, neurodegenerative disorder that causes memory loss and cognitive decline. While there has been extensi… (voir plus)ve research in applying deep learning models to Alzheimer's prediction tasks, these models remain limited by lack of available labeled data, poor generalization across datasets, and inflexibility to varying numbers of input scans and time intervals between scans. In this study, we adapt three state-of-the-art temporal self-supervised learning (SSL) approaches for 3D brain MRI analysis, and add novel extensions designed to handle variable-length inputs and learn robust spatial features. We aggregate four publicly available datasets comprising 3,161 patients for pre-training, and show the performance of our model across multiple Alzheimer's prediction tasks including diagnosis classification, conversion detection, and future conversion prediction. Importantly, our SSL model implemented with temporal order prediction and contrastive learning outperforms supervised learning on six out of seven downstream tasks. It demonstrates adaptability and generalizability across tasks and number of input images with varying time intervals, highlighting its capacity for robust performance across clinical applications. We release our code and model publicly at https://github.com/emilykaczmarek/SSL-AD.

2025-09-01

arXiv (publié)

doi.org

On the frequency variation in load-flow calculations for islanded alternating current microgrids

Matthias Molénat

Jean Mahseredjian

Nasim Rashidirad

Antoine Lesage-Landry

2025-09-01

Mathematics and Computers in Simulation (publié)

doi.org

Programme d’apprentissage IA sur mesure

Mil'Haq Fest 2025

Communauté de pratique de Mila

Demandes de supervision

Publications

Programme d’apprentissage IA sur mesure

Mil'Haq Fest 2025

Communauté de pratique de Mila

Demandes de supervision

Mots-clés populaires:

Publications