Publications

Generalizable, real-time neural decoding with hybrid state-space models

Avery Hee-Woon Ryoo

Nanda H Krishna

Ximeng Mao

Mehdi Azabou

Eva L Dyer

Real-time decoding of neural activity is central to neuroscience and neurotechnology applications, from closed-loop experiments to brain-com… (see more)puter interfaces, where models are subject to strict latency constraints. Traditional methods, including simple recurrent neural networks, are fast and lightweight but often struggle to generalize to unseen data. In contrast, recent Transformer-based approaches leverage large-scale pretraining for strong generalization performance, but typically have much larger computational requirements and are not always suitable for low-resource or real-time settings. To address these shortcomings, we present POSSM, a novel hybrid architecture that combines individual spike tokenization via a cross-attention module with a recurrent state-space model (SSM) backbone to enable (1) fast and causal online prediction on neural activity and (2) efficient generalization to new sessions, individuals, and tasks through multi-dataset pretraining. We evaluate POSSM's decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications, namely handwriting and speech decoding in human subjects. Notably, we demonstrate that pretraining on monkey motor-cortical recordings improves decoding performance on the human handwriting task, highlighting the exciting potential for cross-species transfer. In all of these tasks, we find that POSSM achieves decoding accuracy comparable to state-of-the-art Transformers, at a fraction of the inference cost (up to 9x faster on GPU). These results suggest that hybrid SSMs are a promising approach to bridging the gap between accuracy, inference speed, and generalization when training neural decoders for real-time, closed-loop applications.

2025-06-05

ArXiv (preprint)

arxiv.org

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

Johannes Von Oswald

Nino Scherrer

Seijin Kobayashi

Luca Versari

Songlin Yang

Maximilian Schlegel

Kaitlin Maile

Yanick Schimpf

Oliver Sieberling

Alexander Meulemans

Rif A. Saurous

Guillaume Lajoie

Charlotte Frenkel

Razvan Pascanu

Blaise Aguera y Arcas

João Sacramento

Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, trans… (see more)formers require scaling memory and compute linearly during inference. A recent stream of work linearized the softmax operation, resulting in powerful recurrent neural network (RNN) models with constant memory and compute costs such as DeltaNet, Mamba or xLSTM. These models can be unified by noting that their recurrent layer dynamics can all be derived from an in-context regression objective, approximately optimized through an online learning rule. Here, we join this line of work and introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer (von Oswald et al., 2024), and study it in language modeling at the billion-parameter scale. This layer again stems from an in-context loss, but which is now minimized to optimality at every time point using a fast conjugate gradient solver. Through an extensive suite of experiments, we show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs, especially on tasks requiring long context understanding. This performance gain comes at the cost of additional flops spent during inference time. Our results are therefore intriguingly related to recent trends of increasing test-time compute to improve performance -- here by spending compute to solve sequential optimization problems within the neural network itself.

2025-06-05

ArXiv (preprint)

arxiv.org

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

Johannes Von Oswald

Nino Scherrer

Seijin Kobayashi

Luca Versari

Songlin Yang

Maximilian Schlegel

Kaitlin Maile

Yanick Schimpf

Oliver Sieberling

Alexander Meulemans

Rif A. Saurous

Guillaume Lajoie

Charlotte Frenkel

Razvan Pascanu

Blaise Aguera y Arcas

João Sacramento

Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, trans… (see more)formers require scaling memory and compute linearly during inference. A recent stream of work linearized the softmax operation, resulting in powerful recurrent neural network (RNN) models with constant memory and compute costs such as DeltaNet, Mamba or xLSTM. These models can be unified by noting that their recurrent layer dynamics can all be derived from an in-context regression objective, approximately optimized through an online learning rule. Here, we join this line of work and introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer (von Oswald et al., 2024), and study it in language modeling at the billion-parameter scale. This layer again stems from an in-context loss, but which is now minimized to optimality at every time point using a fast conjugate gradient solver. Through an extensive suite of experiments, we show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs, especially on tasks requiring long context understanding. This performance gain comes at the cost of additional flops spent during inference time. Our results are therefore intriguingly related to recent trends of increasing test-time compute to improve performance -- here by spending compute to solve sequential optimization problems within the neural network itself.

2025-06-05

ArXiv (preprint)

arxiv.org

Policy context and digital development: a comparative study of trajectories in 4 Canadian academic health centers over 30 years

Aude Motulsky

Susan Usher

Pascale Lehoux

Catherine Régis

Trish Reay

Paul Hebert

Lise Gauvin

Alain Biron

G Ross Baker

Marie-Pierre Moreault

Johanne Préval

Jean-Louis Denis

2025-06-05

Journal of the American Medical Informatics Association (published)

doi.org

SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?

Senyu Li

Jiayi Wang

Felermino Dario Mario Ali

Colin Cherry

Daniel Deutsch

Eleftheria Briakou

Rui Sousa-Silva

Henrique Lopes Cardoso

Pontus Stenetorp

David Ifeoluwa Adelani

Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often… (see more) suffer from limited language coverage and poor performance in low-resource settings. While recent efforts, such as AfriCOMET, have addressed some of the issues, they are still constrained by small evaluation sets, a lack of publicly available training data tailored to African languages, and inconsistent performance in extremely low-resource scenarios. In this work, we introduce SSA-MTE, a large-scale human-annotated MT evaluation (MTE) dataset covering 13 African language pairs from the News domain, with over 63,000 sentence-level annotations from a diverse set of MT systems. Based on this data, we develop SSA-COMET and SSA-COMET-QE, improved reference-based and reference-free evaluation metrics. We also benchmark prompting-based approaches using state-of-the-art LLMs like GPT-4o and Claude. Our experimental results show that SSA-COMET models significantly outperform AfriCOMET and are competitive with the strongest LLM (Gemini 2.5 Pro) evaluated in our study, particularly on low-resource languages such as Twi, Luo, and Yoruba. All resources are released under open licenses to support future research.

2025-06-05

ArXiv (preprint)

arxiv.org

SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?

Senyu Li

Jiayi Wang

Felermino Dario Mario Ali

Colin Cherry

Daniel Deutsch

Eleftheria Briakou

Rui Sousa-Silva

Henrique Lopes Cardoso

Pontus Stenetorp

David Ifeoluwa Adelani

Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often… (see more) suffer from limited language coverage and poor performance in low-resource settings. While recent efforts, such as AfriCOMET, have addressed some of the issues, they are still constrained by small evaluation sets, a lack of publicly available training data tailored to African languages, and inconsistent performance in extremely low-resource scenarios. In this work, we introduce SSA-MTE, a large-scale human-annotated MT evaluation (MTE) dataset covering 13 African language pairs from the News domain, with over 63,000 sentence-level annotations from a diverse set of MT systems. Based on this data, we develop SSA-COMET and SSA-COMET-QE, improved reference-based and reference-free evaluation metrics. We also benchmark prompting-based approaches using state-of-the-art LLMs like GPT-4o and Claude. Our experimental results show that SSA-COMET models significantly outperform AfriCOMET and are competitive with the strongest LLM (Gemini 2.5 Pro) evaluated in our study, particularly on low-resource languages such as Twi, Luo, and Yoruba. All resources are released under open licenses to support future research.

2025-06-05

ArXiv (preprint)

arxiv.org

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

Andrei Mircea

Supriyo Chakraborty

Nima Chitsazan

Irina Rish

Ekaterina Lobacheva

This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models … (see more)undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl

2025-06-05

ArXiv (preprint)

arxiv.org

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

Andrei Mircea

Supriyo Chakraborty

Nima Chitsazan

Irina Rish

Ekaterina Lobacheva

This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models … (see more)undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl

2025-06-05

ArXiv (preprint)

arxiv.org

RETRO SYNFLOW: Discrete Flow Matching for Accurate and Diverse Single-Step Retrosynthesis

Robin Yadav

Qi Yan

Guy Wolf

Joey Bose

Renjie Liao

A fundamental problem in organic chemistry is identifying and predicting the series of reactions that synthesize a desired target product mo… (see more)lecule. Due to the combinatorial nature of the chemical search space, single-step reactant prediction -- i.e. single-step retrosynthesis -- remains challenging even for existing state-of-the-art template-free generative approaches to produce an accurate yet diverse set of feasible reactions. In this paper, we model single-step retrosynthesis planning and introduce RETRO SYNFLOW (RSF) a discrete flow-matching framework that builds a Markov bridge between the prescribed target product molecule and the reactant molecule. In contrast to past approaches, RSF employs a reaction center identification step to produce intermediate structures known as synthons as a more informative source distribution for the discrete flow. To further enhance diversity and feasibility of generated samples, we employ Feynman-Kac steering with Sequential Monte Carlo based resampling to steer promising generations at inference using a new reward oracle that relies on a forward-synthesis model. Empirically, we demonstrate \nameshort achieves

2025-06-04

ArXiv (preprint)

arxiv.org

Trophic Interactions Are Key to Understanding the Effects of Global Change on the Distribution and Functional Role of the Brown Bear

Pablo M. Lucas

Wilfried Thuiller

Lauren Talluto

Ester Polaina

Jörg Albrecht

Nuria Selva

Marta De Barba

Vincenzo Penteriani

Maya Guéguen

Niko Balkenhol

Trishna Dutta

Ancuta Fedorca

Shane C. Frank

Andreas Zedrosser

Ivan Afonso‐Jordana

Hüseyin Ambarlı

Fernando Ballesteros

Andriy‐Taras Bashta

Cemal Can Bilgin

Neda Bogdanović … (see 67 more)

Edgars Bojārs

Katarzyna Bojarska

Natalia Bragalanti

Henrik Brøseth

Mark W. Chynoweth

Duško Ćirović

Paolo Ciucci

Andrea Corradini

Daniele De Angelis

Miguel de Gabriel Hernando

Csaba Domokos

Aleksander Dutsov

Alper Ertürk

Stefano Filacorda

Lorenzo Frangini

Claudio Groff

Samuli Heikkinen

Bledi Hoxha

Djuro Huber

Otso Huitu

Georgeta Ionescu

Ovidiu Ionescu

Klemen Jerina

Ramon Jurj

Alexandros A. Karamanlidis

Jonas Kindberg

Ilpo Kojola

José Vicente López‐Bao

Peep Männil

Dime Melovski

Yorgos Mertzanis

Paolo Molinari

Anja Molinari‐Jobin

Andrea Mustoni

Javier Naves

Sergey Ogurtsov

Deniz Özüt

Santiago Palazón

Luca Pedrotti

Aleksandar Perović

Vladimir N. Piminov

Ioan‐Mihai Pop

Marius Popa

Maria Psaralexi

Pierre‐Yves Quenette

Georg Rauer

Slaven Reljic

Eloy Revilla

Urmas Saarma

Alexander P. Saveljev

Ali Onur Sayar

Çagan H. Şekercioğlu

Agnieszka Sergiel

George Sîrbu

Tomaž Skrbinšek

Michaela Skuban

Anil Soyumert

Aleksandar Stojanov

Egle Tammeleht

Konstantin Tirronen

Aleksandër Trajçe

Igor Trbojević

Tijana Trbojević

Filip Zięba

Diana Zlatanova

Tomasz Zwijacz‐Kozica

Laura J. Pollock

ABSTRACT Biotic interactions are expected to influence species' responses to global changes, but they are rarely considered across broad spa… (see more)tial extents. Abiotic factors are thought to operate at larger spatial scales, while biotic factors, such as species interactions, are considered more important at local scales within communities, in part because of the knowledge gap on species interactions at large spatial scales (i.e., the Eltonian shortfall). We assessed, at a continental scale, (i) the importance of biotic interactions, through food webs, on species distributions, and (ii) how biotic interactions under scenarios of climate and land‐use change may affect the distribution of the brown bear ( Ursus arctos ). We built a highly detailed, spatially dynamic, and empirically sampled food web based on the energy contribution of 276 brown bear food species from different taxa (plants, vertebrates, and invertebrates) and their ensemble habitat models at high resolution across Europe. Then, combining energy contribution and predicted habitat of food species, we modelled energy contribution across space and included these layers within Bayesian‐based models of the brown bear distribution in Europe. The inclusion of biotic interactions considerably improved our understanding of brown bear distribution at large (continental) scales compared with Bayesian models including only abiotic factors (climate and land use). Predicted future range shifts, which included changes in the distribution of food species, varied greatly when considering various scenarios of change in biotic factors, providing a warning that future indirect climate and land‐use change are likely to have strong but highly uncertain impacts on species biogeography. Our study confirmed that advancing our understanding of ecological networks of species interactions will improve future projections of biodiversity change, especially for modelling species distributions and their functional role under climate and land‐use change scenarios, which is key for effective conservation of biodiversity and ecosystem services.

2025-06-04

Global Change Biology (published)

doi.org

Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

Emma Harvey

Emily Sheng

Su Lin Blodgett

Alexandra Chouldechova

Jean Garcia-Gathright

Alexandra Olteanu

Hanna Wallach

The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language mo… (see more)del (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments - even useful instruments - are not used by practitioners due to practical and institutional barriers impeding their uptake. Drawing on measurement theory and pragmatic measurement, we provide recommendations for addressing these challenges to better meet practitioner needs.

2025-06-04

ArXiv (preprint)

arxiv.org

Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

Emma Harvey

Emily Sheng

Su Lin Blodgett

Alexandra Chouldechova

Jean Garcia-Gathright

Alexandra Olteanu

Hanna Wallach

The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language mo… (see more)del (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments - even useful instruments - are not used by practitioners due to practical and institutional barriers impeding their uptake. Drawing on measurement theory and pragmatic measurement, we provide recommendations for addressing these challenges to better meet practitioner needs.

2025-06-04

ArXiv (preprint)

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications