Publications

DASB - Discrete Audio and Speech Benchmark

Pooneh Mousavi

Jarod Duret

Darius Petermann

Anastasia Kuznetsova

Discrete audio tokens have recently gained considerable attention for their potential to bridge audio and language processing, enabling mult… (voir plus)imodal language models that can both generate and understand audio. However, preserving key information such as phonetic content, speaker identity, and paralinguistic cues remains a major challenge. Identifying the optimal tokenizer and configuration is further complicated by inconsistent evaluation settings across existing studies. To address this, we introduce the Discrete Audio and Speech Benchmark (DASB), a comprehensive framework for benchmarking discrete audio tokens across speech, general audio, and music domains on a range of discriminative and generative tasks. Our results show that discrete representations are less robust than continuous ones and require careful tuning of factors such as model architecture, data size, learning rate, and capacity. Semantic tokens generally outperform acoustic tokens, but a gap remains between discrete tokens and continuous features, highlighting the need for further research. DASB codes, evaluation setup, and leaderboards are publicly available at https://poonehmousavi.github.io/DASB-website/.

2026-04-12

Transactions on Machine Learning Research (accepté)

doi.org

openreview.net

Early detection of common reed ( <i>Phragmites australis</i> ) using unoccupied aerial vehicles and deep learning

Antoine Caron-Guay

Mickaël Germain

Étienne Laliberté

2026-04-12

Canadian Journal of Remote Sensing (publié)

doi.org

EXPRESS: Climate Communications in IPOs: Unpacking the Influence of Climate Disclosure Volume, Sender, and Message Characteristics

Ankit Anand

Alok R. Saboo

Ritesh Adhyapak

Climate disclosures have emerged as a prominent communication tool for firms facing growing pressure to address climate challenges, yet thei… (voir plus)r impact on firm performance remains unclear. This study proposes a nonlinear (U-shaped) relationship between climate disclosure volume and IPO firm performance, grounded in a damage-limitation logic. At low to moderate levels, disclosures amplify risk salience and proprietary costs, damaging valuations. At higher levels, offsetting benefits related to information, stewardship, and climate-friendly reputation outweigh these costs. Using multi-sourced data from 1,586 IPO firms, a BERT-based large language model to identify climate-related text in prospectuses, and econometric methods that address endogeneity, the authors find support for the proposed U-shaped relationship. The research further demonstrates that sender characteristics (underwriter reputation, customer concentration, and market orientation) and message characteristics (discretionary disclosure and message clarity) moderate the nonlinear relationship. Post-hoc analyses decomposing disclosure content reveal that climate risk disclosures damage valuations. In contrast, climate risk-management disclosures (governance, strategy, and metrics/targets) generate positive effects, suggesting that disclosure effectiveness depends on both volume and content composition. These effects persist in the long-term performance of firms. The findings provide actionable insights for firms developing disclosure strategies and policymakers encouraging climate-related communication.

2026-04-12

Journal of Marketing (publié)

doi.org

A Mechanistic Analysis of Looped Reasoning Language Models

Hugh Blayney

Álvaro Arroyo

Johan Obando-Ceron

Pablo Samuel Castro

Aaron Courville

Michael M. Bronstein

Xiaowen Dong

Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by … (voir plus)looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics differ from those of standard feedforward models. In this paper, we conduct a mechanistic analysis of the latent states in looped language models, focusing in particular on how the stages of inference observed in feedforward models compare to those observed in looped ones. To this end, we analyze cyclic recurrence and show that for many of the studied models each layer in the cycle converges to a distinct fixed point; consequently, the recurrent block follows a consistent cyclic trajectory in the latent space. We provide evidence that as these fixed points are reached, attention-head behavior stabilizes, leading to constant behavior across recurrences. Empirically, we discover that recurrent blocks learn stages of inference that closely mirror those of feedforward models, repeating these stages in depth with each iteration. We study how recurrent block size, input injection, and normalization influence the emergence and stability of these cyclic fixed points. We believe these findings help translate mechanistic insights into practical guidance for architectural design.

2026-04-12

arXiv (prépublication)

doi.org

arxiv.org

Towards Autonomous Mechanistic Reasoning in Virtual Cells

Yunhui Jang

Lu Zhu

Jake Fawkes

Alisandra Kaye Denton

Dominique Beaini

Emmanuel Noutahi

Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However,… (voir plus) their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, we introduce a structured explanation formalism for virtual cells that represents biological reasoning as mechanistic action graphs, enabling systematic verification and falsification. Building upon this, we propose VCR-Agent, a multi-agent framework that integrates biologically grounded knowledge retrieval with a verifier-based filtering approach to generate and validate mechanistic reasoning autonomously. Using this framework, we release VC-TRACES dataset, which consists of verified mechanistic explanations derived from the Tahoe-100M atlas. Empirically, we demonstrate that training with these explanations improves factual precision and provides a more effective supervision signal for downstream gene expression prediction. These results underscore the importance of reliable mechanistic reasoning for virtual cells, achieved through the synergy of multi-agent and rigorous verification.

2026-04-12

arXiv (prépublication)

doi.org

arxiv.org

Towards Brain MRI Foundation Models for the Clinic: Findings from the FOMO25 Challenge

Asbjørn Munk

Stefano Cerri

Vardan Nersesjan

Christian Hedeager Krag

Jakob Ambsdorf

Pedro García

Julia Machnio

Peirong Liu

Suhyun Ahn

Nasrin Akbari

Yasmina Al Khalil

Kimberly Amador

Sina Amirrajab

Tal Arbel

Meritxell Bach Cuadra

Ujjwal Baid

Bhakti Baheti

Jaume Banús

Kamil Barbierik

Christoph Brune … (voir 64 de plus)

步岩松

Baptiste Callard

Yuhan Chen

Cornelius Crijnen

Corentin Dancette

Peter Drotár

Prasad Dutande

Nils D. Forkert

Saurabh K. Garg†

Jakub Gazda

Matej Gazda

Benoît Gérin

Partha Ghosh

Weikang Gong

Pedro M. Gordaliza

Sam Hashemi

Tobias Heimann

Fucang Jia

Jiexin Jiang

Emily Kaczmarek

Chris Kang

Seung Kwan Kang

Mohammad Khazaei

Julien Khlaut

Petros Koutsouvelis

Jae Sung Lee

Yuchong Li

Mengye Lyu

Mingchen Ma

Anant Madabhushi

Klaus H. Maier-Hein

Pierre Manceron

Andrés Martínez Mora

Moona Mazher

Felix Meister

Nataliia Molchanova

Steven A. Niederer

Leonard Nürnberg

Jinah Park

Abdul Qayyum

Jonas Richiardi

Antoine Saporta

Branislav Setlak

Ning Shen

Justin Szeto

Constantin Ulrich

Puru Vaish

Vibujithan Vigneshwaran

Leroy Volmer

Zihao Wang

Siqi Wei

Anthony Winder

Jelmer M. Wolterink

Maxence Wynen

Chang YANG

Si Young Yie

Mostafa Mehdipour Ghazi

Akshay Pai

Espen Jimenez‐Solem

Sebastian Nørgaard Llambias

Mikael Boesen

Michael Eriksen Benros

Juan Eugenio Iglesias

Mads Nielsen

Clinical deployment of automated brain MRI analysis faces a fundamental challenge: clinical data is heterogeneous and noisy, and high-qualit… (voir plus)y labels are prohibitively costly to obtain. Self-supervised learning (SSL) can address this by leveraging the vast amounts of unlabeled data produced in clinical workflows to train robust \textit{foundation models} that adapt out-of-domain with minimal supervision. However, the development of foundation models for brain MRI has been limited by small pretraining datasets and in-domain benchmarking focused on high-quality, research-grade data. To address this gap, we organized the FOMO25 challenge as a satellite event at MICCAI 2025. FOMO25 provided participants with a large pretraining dataset, FOMO60K, and evaluated models on data sourced directly from clinical workflows in few-shot and out-of-domain settings. Tasks covered infarct classification, meningioma segmentation, and brain age regression, and considered both models trained on FOMO60K (method track) and any data (open track). Nineteen foundation models from sixteen teams were evaluated using a standardized containerized pipeline. Results show that (a) self-supervised pretraining improves generalization on clinical data under domain shift, with the strongest models trained \textit{out-of-domain} surpassing supervised baselines trained \textit{in-domain}. (b) No single pretraining objective benefits all tasks: MAE favors segmentation, hybrid reconstruction-contrastive objectives favor classification, and (c) strong performance was achieved by small pretrained models, and improvements from scaling model size and training duration did not yield reliable benefits.

2026-04-12

arXiv (prépublication)

doi.org

arxiv.org

White and Gray Matter Multiple Sclerosis Spinal Cord Lesion Characteristics and Individualized Tissue Damage Assessment Using 7 T T1 Mapping

Nilser Laines-Medina

Samira Mchinda

Benoit Testud

Arnaud Le Troter

Lauriane Pini

Bertrand Audoin

Jean Pelletier

Sarah Demortière

Julien Cohen-Adad

Virginie Callot

The aim of this exploratory study was to demonstrate how 7 T MP2RAGE T1 mapping can be used to evaluate spinal cord (SC) tissue damage and l… (voir plus)esion characteristics in multiple sclerosis (MS) at both subregional and individual levels. Fifteen patients with relapsing-remitting MS (pwRRMS; mean disease duration = 32 ± 24.9 mo) and 15 age-matched healthy controls (HC) underwent 7 T cervical 3D MP2RAGE imaging with submillimetric spatial resolution. Automatic SC and lesion segmentations were obtained and manually corrected when necessary. Images were registered to the AMU7T template space to extract T1 values from specific regions of interest (ROIs), including white matter (WM) tracts: corticospinal (CST), lateral sensory (LST), posterior sensory (PST), ventral motor (VMT), and gray matter (GM) subregions: ventral, intermediate, and dorsal. Individual Z -score maps were computed and used to derive a global index of tissue impairment (patient-specific Z -score barplot) for lesion and normal appearing tissues (NAT). Finally, MS lesions were further characterized by their relative lesion load (RLL%), frequency maps, and topography across ROIs. Lesions were predominantly located in the posterior half of the cord, with GM showing the highest RLL. However, no lesions were observed exclusively in GM. An increasing gradient in T1 values was observed, with T1_HC 0.01). Mixed GM-WM lesions exhibited higher T1 values and larger volumes than WM-only lesions. Elevated T1 values

2026-04-12

Investigative Radiology (publié)

doi.org

EIAN: Explicit Interaction-aware Attention Network for Interpretable Event Modeling

Jiping Zhang

Hua Zhu

Hong Huang

Yi Zhou

Kehan Yin

Bang Liu

Event sequences are integral to domains such as e-commerce, social networks, and healthcare. Traditional point process models, like Poisson … (voir plus)and Hawkes processes, are foundational but limited by rigid parametric assumptions, constraining their flexibility in complex real-world scenarios. Neural point processes offer a more adaptable alternative, but typically perform implicit sequence modeling, which does not fully exploit critical event interaction patterns and limits transparency. To address these challenges, we introduce the Explicit Interaction-aware Attention Network (EIAN), a novel model that enhances event modeling by explicitly capturing both intra-type and cross-type event interactions. Specifically, EIAN employs two key components: an intra-type temporal encoder that preserves the unique temporal dynamics within each event type, and a cross-type interaction decoder that highlights interactions across event types. Furthermore, two temporal encoding mechanisms are integrated into the interaction decoder to handle irregular inter-event intervals in diverse temporal scenarios. Extensive experiments show that EIAN consistently outperforms existing models in predictive performance and provides deeper insights into event interaction patterns, advancing both flexibility and interpretability. Our code is available at https://github.com/CGCL-codes/EIAN.git.

2026-04-11

ACM Web Conference (publié)

doi.org

Forecasting Developer Environments with GenAI: A Research Perspective

Raula Gaikovina Kula

Christoph Treude

Xing Hu

Sebastian Baltes

Earl T. Barr

Kelly Blincoe

Fabio Calefato

J Chen

Marc Cheong

Youmei Fan

Daniel M. Germán

Marco Gerosa

Jin L.C. Guo

Shinpei Hayashi

Robert Hirschfeld

Reid Holmes

Yintong Huo

Takashi Kobayashi

Michele Lanza

Zhongxin Liu … (voir 11 de plus)

Olivier Nourry

Nicole Novielli

Denys Poshyvanyk

Shinobu Saito

Kazumasa Shimari

Igor Steinmacher

Mairieli Wessel

Markus Wagner

Annie Vella

Laurie Williams

Xin Xia

Generative Artificial Intelligence (GenAI) models are achieving remarkable performance in various tasks, including code generation, testing,… (voir plus) code review, and program repair. The ability to increase the level of abstraction away from writing code has the potential to change the Human-AI interaction within the integrated development environment (IDE). To explore the impact of GenAI on IDEs, 33 experts from the Software Engineering, Artificial Intelligence, and Human-Computer Interaction domains gathered to discuss challenges and opportunities at Shonan Meeting 222, a four-day intensive research meeting. Four themes emerged as areas of interest for researchers and practitioners.

2026-04-11

International Workshop on Integrated Development Environments @ ACM/IEEE International Conference on Software Engineering (publié)

doi.org

arxiv.org

TAPNext++: What's Next for Tracking Any Point (TAP)?

Sebastian Jung

Artem Zholus

Martin Sundermeyer

Carl Doersch

Ross Goroshin

David Joseph Tan

Sarath Chandar

Rudolph Triebel

Federico Tombari

Tracking-Any-Point (TAP) models aim to track any point through a video which is a crucial task in AR/XR and robotics applications. The recen… (voir plus)tly introduced TAPNext approach proposes an end-to-end, recurrent transformer architecture to track points frame-by-frame in a purely online fashion -- demonstrating competitive performance at minimal latency. However, we show that TAPNext struggles with longer video sequences and also frequently fails to re-detect query points that reappear after being occluded or leaving the frame. In this work, we present TAPNext++, a model that tracks points in sequences that are orders of magnitude longer while preserving the low memory and compute footprint of the architecture. We train the recurrent video transformer using several data-driven solutions, including training on long 1024-frame sequences enabled by sequence parallelism techniques. We highlight that re-detection performance is a blind spot in the current literature and introduce a new metric, Re-Detection Average Jaccard (

2026-04-11

arXiv (prépublication)

doi.org

arxiv.org

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

Miles Q. Li

Benjamin C. M. Fung

Boyang Li

Heba Ismail

Farkhund Iqbal

The rapid deployment of LLM-based autonomous agents has introduced safety risks that extend far beyond traditional LLM concerns, prompting a… (voir plus) proliferation of safety benchmarks since late 2023. However, these benchmarks have developed independently, with inconsistent threat models, incompatible metrics, and overlapping yet incomplete risk coverage. We present the first systematic analysis dedicated to agent safety benchmarks as evaluation instruments. We catalog 40 behavioral agent-safety benchmarks (2023-2026), plus 5 adjacent evaluator, defense, and dataset artifacts, propose a six-axis taxonomy of benchmark evaluation methodology, and apply it across the corpus to characterize how methodological choices shape safety conclusions. A coverage matrix reveals broad risk coverage but limited methodological convergence, while the taxonomy analysis shows a behavioral-benchmark core concentrated in sandboxed, constrained, and often safety-only evaluation. Across the landscape, we find that benchmark choice can yield contradictory safety conclusions, coverage counts often overstate evaluation depth, environment fidelity systematically shapes reported safety, the field disproportionately tests externally imposed rather than agent-internal risks, metric fragmentation limits comparison, and robustness remains effectively unbenchmarked. We ground these claims with a cross-benchmark consistency check, with 95% confidence intervals and Kendall's W concordance analysis, finding no evidence of ranking concordance across evaluation dimensions (W = 0.10, p = 0.94). We release structured metadata, full taxonomy codings, risk annotations, and all experimental artifacts, and propose minimum reporting standards for future benchmarks.

2026-04-10

arXiv (prépublication)

doi.org

arxiv.org

Active search generation for nanophotonic design in the small data regime

Vincent Létourneau

Yuri Grinberg

Dan Kushnir

Yanlei Zhang

Dan-Xia Xu

Guy Wolf

2026-04-09

Machine Learning in Photonics (publié)

doi.org

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Publications