Publications

Boosting LLM Reasoning via Spontaneous Self-Correction

Xutong Zhao

Tengyu Xu

Xuewei Wang

Zhengxing Chen

Di Jin

Liang Tan

Zishun Yu

Zhuokai Zhao

Yun He

Sinong Wang

Han Fang

Sarath Chandar

Chen Zhu

MetaAI

Mila - Québec

AI Institute

Polytechnique Montréal

While large language models (LLMs) have demonstrated remarkable success on a broad range of tasks, math reasoning remains a challenging one.… (see more) One of the approaches for improving math reasoning is self-correction, which designs self-improving loops to let the model correct its own mistakes. However, existing self-correction approaches treat corrections as standalone post-generation refinements, relying on extra prompt and system designs to elicit self-corrections, instead of performing real-time, spontaneous self-corrections in a single pass. To address this, we propose SPOC, a spontaneous self-correction approach that enables LLMs to generate interleaved solutions and verifications in a single inference pass, with generation dynamically terminated based on verification outcomes, thereby effectively scaling inference time compute. SPOC considers a multi-agent perspective by assigning dual roles -- solution proposer and verifier -- to the same model. We adopt a simple yet effective approach to generate synthetic data for fine-tuning, enabling the model to develop capabilities for self-verification and multi-agent collaboration. We further improve its solution proposal and verification accuracy through online reinforcement learning. Experiments on mathematical reasoning benchmarks show that SPOC significantly improves performance. Notably, SPOC boosts the accuracy of Llama-3.1-8B and 70B Instruct models, achieving gains of 8.8% and 11.6% on MATH500, 10.0% and 20.0% on AMC23, and 3.3% and 6.7% on AIME24, respectively.

2025-06-07

ArXiv (preprint)

arxiv.org

A Self-Supervised Foundation Model for Robust and Generalizable Representation Learning in STED Microscopy

Anthony Bilodeau

Frédéric Beaupré

Julia Chabbert

Jean-Michel Bellavance

Koraly Lessard

Andréanne Deschênes

Renaud Bernatchez

Paul De Koninck

Christian Gagné

Flavie Lavoie-Cardinal

2025-06-06

bioRxiv (preprint)

doi.org

Bringing SAM to new heights: Leveraging elevation data for tree crown segmentation from drone imagery

Mélisande Teng

Arthur Ouaknine

Etienne Lalibert'e

Yoshua Bengio

David Rolnick

Hugo Larochelle

Information on trees at the individual level is crucial for monitoring forest ecosystems and planning forest management. Current monitoring … (see more)methods involve ground measurements, requiring extensive cost, time and labor. Advances in drone remote sensing and computer vision offer great potential for mapping individual trees from aerial imagery at broad-scale. Large pre-trained vision models, such as the Segment Anything Model (SAM), represent a particularly compelling choice given limited labeled data. In this work, we compare methods leveraging SAM for the task of automatic tree crown instance segmentation in high resolution drone imagery in three use cases: 1) boreal plantations, 2) temperate forests and 3) tropical forests. We also study the integration of elevation data into models, in the form of Digital Surface Model (DSM) information, which can readily be obtained at no additional cost from RGB drone imagery. We present BalSAM, a model leveraging SAM and DSM information, which shows potential over other methods, particularly in the context of plantations. We find that methods using SAM out-of-the-box do not outperform a custom Mask R-CNN, even with well-designed prompts. However, efficiently tuning SAM end-to-end and integrating DSM information are both promising avenues for tree crown instance segmentation models.

2025-06-05

ArXiv (preprint)

arxiv.org

Clarifying a working definition for ‘precision communication’: a scoping review of medical literature on communication

Bao-Lam Pham

Brigitte N. Durieux

Amanda Bianco

Corinne Cécyre-Chartrand

Elena Guadagno

Amalia M. Issa

Dan Poenaru

2025-06-05

Personalized Medicine (published)

doi.org

DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models

Revant Teotia

Candace Ross

Karen Ullrich

Sumit Chopra

Adriana Romero Soriano

Melissa Hall

Matthew J. Muckley

Recent advances in text-to-image (T2I) models have achieved impressive quality and consistency. However, this has come at the cost of repres… (see more)entation diversity. While automatic evaluation methods exist for benchmarking model diversity, they either require reference image datasets or lack specificity about the kind of diversity measured, limiting their adaptability and interpretability. To address this gap, we introduce the Does-it/Can-it framework, DIM-CIM, a reference-free measurement of default-mode diversity ("Does"the model generate images with expected attributes?) and generalization capacity ("Can"the model generate diverse attributes for a particular concept?). We construct the COCO-DIMCIM benchmark, which is seeded with COCO concepts and captions and augmented by a large language model. With COCO-DIMCIM, we find that widely-used models improve in generalization at the cost of default-mode diversity when scaling from 1.5B to 8.1B parameters. DIMCIM also identifies fine-grained failure cases, such as attributes that are generated with generic prompts but are rarely generated when explicitly requested. Finally, we use DIMCIM to evaluate the training data of a T2I model and observe a correlation of 0.85 between diversity in training images and default-mode diversity. Our work provides a flexible and interpretable framework for assessing T2I model diversity and generalization, enabling a more comprehensive understanding of model performance.

2025-06-05

ArXiv (preprint)

arxiv.org

Generalizable, real-time neural decoding with hybrid state-space models

Avery Hee-Woon Ryoo

Nanda H Krishna

Ximeng Mao

Mehdi Azabou

Eva L Dyer

Matt Perich

Guillaume Lajoie

Real-time decoding of neural activity is central to neuroscience and neurotechnology applications, from closed-loop experiments to brain-com… (see more)puter interfaces, where models are subject to strict latency constraints. Traditional methods, including simple recurrent neural networks, are fast and lightweight but often struggle to generalize to unseen data. In contrast, recent Transformer-based approaches leverage large-scale pretraining for strong generalization performance, but typically have much larger computational requirements and are not always suitable for low-resource or real-time settings. To address these shortcomings, we present POSSM, a novel hybrid architecture that combines individual spike tokenization via a cross-attention module with a recurrent state-space model (SSM) backbone to enable (1) fast and causal online prediction on neural activity and (2) efficient generalization to new sessions, individuals, and tasks through multi-dataset pretraining. We evaluate POSSM's decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications, namely handwriting and speech decoding in human subjects. Notably, we demonstrate that pretraining on monkey motor-cortical recordings improves decoding performance on the human handwriting task, highlighting the exciting potential for cross-species transfer. In all of these tasks, we find that POSSM achieves decoding accuracy comparable to state-of-the-art Transformers, at a fraction of the inference cost (up to 9x faster on GPU). These results suggest that hybrid SSMs are a promising approach to bridging the gap between accuracy, inference speed, and generalization when training neural decoders for real-time, closed-loop applications.

2025-06-05

ArXiv (preprint)

arxiv.org

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

Johannes Von Oswald

Nino Scherrer

Seijin Kobayashi

Luca Versari

Songlin Yang

Maximilian Schlegel

Kaitlin Maile

Yanick Schimpf

Oliver Sieberling

Alexander Meulemans

Rif A. Saurous

Guillaume Lajoie

Charlotte Frenkel

Razvan Pascanu

Blaise Aguera y Arcas

João Sacramento

Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, trans… (see more)formers require scaling memory and compute linearly during inference. A recent stream of work linearized the softmax operation, resulting in powerful recurrent neural network (RNN) models with constant memory and compute costs such as DeltaNet, Mamba or xLSTM. These models can be unified by noting that their recurrent layer dynamics can all be derived from an in-context regression objective, approximately optimized through an online learning rule. Here, we join this line of work and introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer (von Oswald et al., 2024), and study it in language modeling at the billion-parameter scale. This layer again stems from an in-context loss, but which is now minimized to optimality at every time point using a fast conjugate gradient solver. Through an extensive suite of experiments, we show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs, especially on tasks requiring long context understanding. This performance gain comes at the cost of additional flops spent during inference time. Our results are therefore intriguingly related to recent trends of increasing test-time compute to improve performance -- here by spending compute to solve sequential optimization problems within the neural network itself.

2025-06-05

ArXiv (preprint)

arxiv.org

Policy context and digital development: a comparative study of trajectories in 4 Canadian academic health centers over 30 years

Aude Motulsky

Susan Usher

Pascale Lehoux

Catherine Régis

Trish Reay

Paul Hebert

Lise Gauvin

Alain Biron

G Ross Baker

Marie-Pierre Moreault

Johanne Préval

Jean-Louis Denis

2025-06-05

Journal of the American Medical Informatics Association (published)

doi.org

SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?

Senyu Li

Jiayi Wang

Felermino Dario Mario Ali

Colin Cherry

Daniel Deutsch

Eleftheria Briakou

Rui Sousa-Silva

Henrique Lopes Cardoso

Pontus Stenetorp

David Ifeoluwa Adelani

Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often… (see more) suffer from limited language coverage and poor performance in low-resource settings. While recent efforts, such as AfriCOMET, have addressed some of the issues, they are still constrained by small evaluation sets, a lack of publicly available training data tailored to African languages, and inconsistent performance in extremely low-resource scenarios. In this work, we introduce SSA-MTE, a large-scale human-annotated MT evaluation (MTE) dataset covering 13 African language pairs from the News domain, with over 63,000 sentence-level annotations from a diverse set of MT systems. Based on this data, we develop SSA-COMET and SSA-COMET-QE, improved reference-based and reference-free evaluation metrics. We also benchmark prompting-based approaches using state-of-the-art LLMs like GPT-4o and Claude. Our experimental results show that SSA-COMET models significantly outperform AfriCOMET and are competitive with the strongest LLM (Gemini 2.5 Pro) evaluated in our study, particularly on low-resource languages such as Twi, Luo, and Yoruba. All resources are released under open licenses to support future research.

2025-06-05

ArXiv (preprint)

arxiv.org

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

Andrei Mircea

Supriyo Chakraborty

Nima Chitsazan

Irina Rish

Ekaterina Lobacheva

This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models … (see more)undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl

2025-06-05

ArXiv (preprint)

arxiv.org

Trophic Interactions Are Key to Understanding the Effects of Global Change on the Distribution and Functional Role of the Brown Bear

Pablo M. Lucas

Wilfried Thuiller

Lauren Talluto

Ester Polaina

Jörg Albrecht

Nuria Selva

Marta De Barba

Vincenzo Penteriani

Maya Guéguen

Niko Balkenhol

Trishna Dutta

Ancuta Fedorca

Shane C. Frank

Andreas Zedrosser

Ivan Afonso‐Jordana

Hüseyin Ambarlı

Fernando Ballesteros

Andriy‐Taras Bashta

Cemal Can Bilgin

Neda Bogdanović … (see 67 more)

Edgars Bojārs

Katarzyna Bojarska

Natalia Bragalanti

Henrik Brøseth

Mark W. Chynoweth

Duško Ćirović

Paolo Ciucci

Andrea Corradini

Daniele De Angelis

Miguel de Gabriel Hernando

Csaba Domokos

Aleksander Dutsov

Alper Ertürk

Stefano Filacorda

Lorenzo Frangini

Claudio Groff

Samuli Heikkinen

Bledi Hoxha

Djuro Huber

Otso Huitu

Georgeta Ionescu

Ovidiu Ionescu

Klemen Jerina

Ramon Jurj

Alexandros A. Karamanlidis

Jonas Kindberg

Ilpo Kojola

José Vicente López‐Bao

Peep Männil

Dime Melovski

Yorgos Mertzanis

Paolo Molinari

Anja Molinari‐Jobin

Andrea Mustoni

Javier Naves

Sergey Ogurtsov

Deniz Özüt

Santiago Palazón

Luca Pedrotti

Aleksandar Perović

Vladimir N. Piminov

Ioan‐Mihai Pop

Marius Popa

Maria Psaralexi

Pierre‐Yves Quenette

Georg Rauer

Slaven Reljic

Eloy Revilla

Urmas Saarma

Alexander P. Saveljev

Ali Onur Sayar

Çagan H. Şekercioğlu

Agnieszka Sergiel

George Sîrbu

Tomaž Skrbinšek

Michaela Skuban

Anil Soyumert

Aleksandar Stojanov

Egle Tammeleht

Konstantin Tirronen

Aleksandër Trajçe

Igor Trbojević

Tijana Trbojević

Filip Zięba

Diana Zlatanova

Tomasz Zwijacz‐Kozica

Laura J. Pollock

ABSTRACT Biotic interactions are expected to influence species' responses to global changes, but they are rarely considered across broad spa… (see more)tial extents. Abiotic factors are thought to operate at larger spatial scales, while biotic factors, such as species interactions, are considered more important at local scales within communities, in part because of the knowledge gap on species interactions at large spatial scales (i.e., the Eltonian shortfall). We assessed, at a continental scale, (i) the importance of biotic interactions, through food webs, on species distributions, and (ii) how biotic interactions under scenarios of climate and land‐use change may affect the distribution of the brown bear ( Ursus arctos ). We built a highly detailed, spatially dynamic, and empirically sampled food web based on the energy contribution of 276 brown bear food species from different taxa (plants, vertebrates, and invertebrates) and their ensemble habitat models at high resolution across Europe. Then, combining energy contribution and predicted habitat of food species, we modelled energy contribution across space and included these layers within Bayesian‐based models of the brown bear distribution in Europe. The inclusion of biotic interactions considerably improved our understanding of brown bear distribution at large (continental) scales compared with Bayesian models including only abiotic factors (climate and land use). Predicted future range shifts, which included changes in the distribution of food species, varied greatly when considering various scenarios of change in biotic factors, providing a warning that future indirect climate and land‐use change are likely to have strong but highly uncertain impacts on species biogeography. Our study confirmed that advancing our understanding of ecological networks of species interactions will improve future projections of biodiversity change, especially for modelling species distributions and their functional role under climate and land‐use change scenarios, which is key for effective conservation of biodiversity and ecosystem services.

2025-06-04

Global Change Biology (published)

doi.org

Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

Emma Harvey

Emily Sheng

Su Lin Blodgett

Alexandra Chouldechova

Jean Garcia-Gathright

Alexandra Olteanu

Hanna Wallach

The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language mo… (see more)del (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments - even useful instruments - are not used by practitioners due to practical and institutional barriers impeding their uptake. Drawing on measurement theory and pragmatic measurement, we provide recommendations for addressing these challenges to better meet practitioner needs.

2025-06-04

ArXiv (preprint)

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications