Publications

Tiny Aya: Bridging Scale and Multilingual Depth
Alejandro Salamanca
Diana Abagyan
Daniel D'souza
Ammar Khairi
David Mora
Saurabh Dash
Viraat Aryabumi
Sara Rajaee
Ananya Sahu
Thomas Euyang
Brittawnya Prince
Madeline Smith
Hangyu Lin
Acyr Locatelli
Sara Hooker
Tom Kocmi
Aidan Gomez
Ivan Zhang
Phil Blunsom … (see 6 more)
Nick Frosst
Beyza Ermis
Ahmet Üstün
Marzieh Fadaee
Tiny Aya redefines what a small multilingual language model can achieve. Trained on 70 languages and refined through region-aware posttraini… (see more)ng, it delivers state-of-the-art in translation quality, strong multilingual understanding, and high-quality target-language generation, all with just 3.35B parameters. The release includes a pretrained foundation model, a globally balanced instruction-tuned variant, and three region-specialized models targeting languages from Africa, South Asia, Europe, Asia-Pacific, and West Asia. This report details the training strategy, data composition, and comprehensive evaluation framework behind Tiny Aya, and presents an alternative scaling path for multilingual AI: one centered on efficiency, balanced performance across languages, and practical deployment.
TRACE: Temporal Rule-Anchored Chain-of-Evidence on Knowledge Graphs for Interpretable Stock Movement Prediction
Luis Castejón Lozano
Miguel Conner
Juan Abia
Luis Gallego-Ledesma
Joshua Fellowes
Gerard Conangla Planes
Adam Elwood
We present a Temporal Rule-Anchored Chain-of-Evidence (TRACE) on knowledge graphs for interpretable stock movement prediction that unifies s… (see more)ymbolic relational priors, dynamic graph exploration, and LLM-guided decision making in a single end-to-end pipeline. The approach performs rule-guided multi-hop exploration restricted to admissible relation sequences, grounds candidate reasoning chains in contemporaneous news, and aggregates fully grounded evidence into auditable \texttt{UP}/\texttt{DOWN} verdicts with human-readable paths connecting text and structure. On an S\&P~500 benchmark, the method achieves 55.1\% accuracy, 55.7\% precision, 71.5\% recall, and 60.8\% F1, surpassing strong baselines and improving recall and F1 over the best graph baseline under identical evaluation. The gains stem from (i) rule-guided exploration that focuses search on economically meaningful motifs rather than arbitrary walks, and (ii) text-grounded consolidation that selectively aggregates high-confidence, fully grounded hypotheses instead of uniformly pooling weak signals. Together, these choices yield higher sensitivity without sacrificing selectivity, delivering predictive lift with faithful, auditably interpretable explanations.
JEDI: Jointly Embedded Inference of Neural Dynamics
Animal brains flexibly and efficiently achieve many behavioral tasks with a single neural network. A core goal in modern neuroscience is to … (see more)map the mechanisms of the brain's flexibility onto the dynamics underlying neural populations. However, identifying task-specific dynamical rules from limited, noisy, and high-dimensional experimental neural recordings remains a major challenge, as experimental data often provide only partial access to brain states and dynamical mechanisms. While recurrent neural networks (RNNs) directly constrained neural data have been effective in inferring underlying dynamical mechanisms, they are typically limited to single-task domains and struggle to generalize across behavioral conditions. Here, we introduce JEDI, a hierarchical model that captures neural dynamics across tasks and contexts by learning a shared embedding space over RNN weights. This model recapitulates individual samples of neural dynamics while scaling to arbitrarily large and complex datasets, uncovering shared structure across conditions in a single, unified model. Using simulated RNN datasets, we demonstrate that JEDI accurately learns robust, generalizable, condition-specific embeddings. By reverse-engineering the weights learned by JEDI, we show that it recovers ground truth fixed point structures and unveils key features of the underlying neural dynamics in the eigenspectra. Finally, we apply JEDI to motor cortex recordings during monkey reaching to extract mechanistic insight into the neural dynamics of motor control. Our work shows that joint learning of contextual embeddings and recurrent weights provides scalable and generalizable inference of brain dynamics from recordings alone.
LLM2Vec-Gen: Generative Embeddings from Large Language Models
LLM-based text embedders typically encode the semantic content of their input. However, embedding tasks require mapping diverse inputs to si… (see more)milar outputs. Typically, this input-output is addressed by training embedding models with paired data using contrastive learning. In this work, we propose a novel self-supervised approach, LLM2Vec-Gen, which adopts a different paradigm: rather than encoding the input, we learn to represent the model's potential response. Specifically, we add trainable special tokens to the LLM's vocabulary, append them to input, and optimize them to represent the LLM's response in a fixed-length sequence. Training is guided by the LLM's own completion for the query, along with an unsupervised embedding teacher that provides distillation targets. This formulation helps to bridge the input-output gap and transfers LLM capabilities such as safety alignment and reasoning to embedding tasks. Crucially, the LLM backbone remains frozen and training requires only unlabeled queries. LLM2Vec-Gen achieves state-of-the-art self-supervised performance on the Massive Text Embedding Benchmark (MTEB), improving by 9.3% over the best unsupervised embedding teacher. We also observe up to 43.2% reduction in harmful content retrieval and 29.3% improvement in reasoning capabilities for embedding tasks. Finally, the learned embeddings are interpretable and can be decoded into text to reveal their semantic content.
SPT-CL J0417–4748: A Deep Chandra Study of a Relaxed Galaxy Cluster without Central Star Formation
Taweewat Somboonpanyakul
A. Mantz
S. W. Allen
Anthony M. Flores
R. Glenn Morris
Haley R. Stueber
L. E. Bleem
B. Floyd
Keunho Kim
Abstract We present an in-depth Chandra X-ray analysis of the galaxy cluster SPT-CL J0417−4748 (hereafter SPT J0417) at z = 0.58 with a fo… (see more)cus on its thermodynamic properties and the apparent absence of central star formation. Utilizing a total Chandra exposure of 103 ks, we find that the large-scale X-ray morphology is consistent with a dynamically relaxed cool-core system. The intracluster medium shows a central density of 0.08 ± 0.01 cm −3 , a central pseudoentropy of 2 6 5 + 6 keV cm 2 , and a central cooling time of 51 5 75 + 96 Myr, values typical of massive cool-core clusters. Despite these conditions, no evidence of recent or ongoing star formation is detected in the brightest cluster galaxy (BCG). Spectral energy di
Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet
Joel Lidin
Amir Sarfi
Erfan Miahi
Quentin Anthony
Shivam Chauhan
Evangelos Pappas
Samuel Dare
Recently, there has been increased interest in globally distributed training, which has the promise to both reduce training costs and democr… (see more)atize participation in building large-scale foundation models. However, existing models trained in a globally distributed manner are relatively small in scale and have only been trained with whitelisted participants. Therefore, they do not yet realize the full promise of democratized participation. In this report, we describe Covenant-72B, an LLM produced by the largest collaborative globally distributed pre-training run (in terms of both compute and model scale), which simultaneously allowed open, permissionless participation supported by a live blockchain protocol. We utilized a state-of-the-art communication-efficient optimizer, SparseLoCo, supporting dynamic participation with peers joining and leaving freely. Our model, pre-trained on approximately 1.1T tokens, performs competitively with fully centralized models pre-trained on similar or higher compute budgets, demonstrating that fully democratized, non-whitelisted participation is not only feasible, but can be achieved at unprecedented scale for a globally distributed pre-training run.
GPT-based self-supervised anomaly detection in command lines
Miles Q. Li
Julien Keutchayan
François Charest
Benjamin C. M. Fung
Process Reward Models That Think
Muhammad Khalifa
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
Step-by-step verifiers—also known as process reward models (PRMs)—are a key ingredient for test-time scaling, but training them requires… (see more) expensive step-level supervision. This work aims to build data-efficient PRMs as verbalized step-wise reward models that verify every step in the solution by generating a verification chain-of-thought (CoT). We propose ThinkPRM, a long CoT verifier fine-tuned on orders of magnitude fewer process labels than those required by discriminative PRMs. Our approach capitalizes on the inherent reasoning abilities of long CoT models, and outperforms LLM-as-a-Judge and discriminative verifiers—using only 1% of the process labels in PRM800K—across several challenging benchmarks. Specifically, ThinkPRM beats the baselines on ProcessBench, MATH-500, and AIME ’24 under best-of-N selection and reward-guided search. In an out-of-domain evaluation over subsets of GPQA-Diamond and LiveCodeBench, our PRM surpasses discriminative verifiers trained with the full PRM800K by 8% and 4.5%, respectively. Lastly, under the same token budget, ThinkPRM scales up verification compute more effectively compared to LLM-as-a-Judge, outperforming it by 7.2% on a subset of ProcessBench. This work highlights the value of generative, long CoT PRMs that can scale test-time compute for verification while requiring minimal supervision for training.
Generalization in Online Reinforcement Learning for Mobile Agents
Zihuan Jiang
Zhixiang Chi
Huan Liu
Ziqiang Wang
Yuanhao Yu
Graphical user interface (GUI)-based mobile agents automate digital tasks on mobile devices by interpreting natural-language instructions an… (see more)d interacting with the screen. While recent methods apply reinforcement learning (RL) to train vision-language-model(VLM) agents in interactive environments with a primary focus on performance, generalization remains underexplored due to the lack of standardized benchmarks and open-source RL systems. In this work, we formalize the problem as a Contextual Markov Decision Process (CMDP) and introduce \textbf{AndroidWorld-Generalization}, a benchmark with three increasingly challenging regimes for evaluating zero-shot generalization to unseen task instances, templates, and applications. We further propose an RL training system that integrates Group Relative Policy Optimization (GRPO) with a scalable rollout collection system, consisting of containerized infrastructure and asynchronous execution % , and error recovery to support reliable and efficient training. Experiments on AndroidWorld-Generalization show that RL enables a 7B-parameter VLM agent to surpass supervised fine-tuning baselines, yielding a 26.1\% improvement on unseen instances but only limited gains on unseen templates (15.7\%) and apps (8.3\%), underscoring the challenges of generalization. As a preliminary step, we demonstrate that few-shot adaptation at test-time improves performance on unseen apps, motivating future research in this direction. To support reproducibility and fair comparison, we open-source the full RL training system, including the environment, task suite, models, prompt configurations, and the underlying infrastructure \footnote{https://github.com/zihuanjiang/AndroidWorld-Generalization}.
Evolutionarily conserved neural dynamics across mice, monkeys, and humans
Anton R Sobinov
Z. Jeffrey Chen
Junchol Park
Nicholas G. Hatsopoulos
Joshua T. Dudman
Juan Álvaro Gallego
Matthew G. Perich
Zihao Chen
On evolutionary timescales, brain circuits adapt to support survival in each species' ecological niche. While some anatomical aspects of neu… (see more)ral circuitry are conserved across species with distant evolutionary origins, each species also exhibits specific circuit adaptations that enable its behavioral repertoire. It remains unclear whether homologous brain regions leverage analogous neural computations as different species perform common behaviors such as reaching and manipulating objects. Here, we directly assessed conservation of neural computations using intracortical recordings from mouse, monkey, and human motor cortex-a homologous region across many mammals-during motor behaviors crucial for survival. We hypothesized that, despite their phylogenetic distance, rodents and primates produce movements through conserved neural computations implemented by motor cortical population dynamics. Remarkably, we found that movement-related neural dynamics were highly conserved across species, while variations in behavioral output were uniquely captured in neural trajectory geometries. Strikingly, neural dynamics during movement across species were more conserved than those across brain regions in the same human and between motor preparation and execution in the same monkeys. Lastly, through manipulation of neural network models trained to perform reaching movements, we reinforce that conservation of neural dynamics across species likely stems from shared circuit constraints. We thus assert that evolution maintains neural computations across phylogeny even as behavioral repertoires expand.
A Latent Space Thermodynamic Model of Cell Differentiation
Ali Poursina
Arsham Mikaeili Namini
Alihossein Saberi
Hamed S. Najafabadi
Abstract Inferring the governing dynamics of differentiation that capture cell state evolution remains a central challenge in single-cell bi… (see more)ology. We present Latent Space Dynamics (LSD), a thermodynamics-inspired framework that models cell differentiation as evolution on a learned Waddington landscape in latent space. LSD jointly infers a low-dimensional cell state, a differentiable potential function governing developmental flow, and a local entropy term that quantifies cellular plasticity. Using a neural ordinary differential equation, LSD reconstructs continuous differentiation trajectories from time-ordered single-cell data. Across diverse developmental systems, LSD accurately recovers lineage hierarchies, predicts fate commitment for unseen cell types, and outperforms existing trajectory inference approaches in directional accuracy. Moreover, in silico gene perturbations reveal how individual regulators reshape the landscape, and entropy provides a quantitative measure of plasticity in development and cancer.
Populus tremuloides as a natural fire barrier in Canada’s boreal forest under a changing climate
Flavie Pelletier
Jeffrey A. Cardille
Joanne C. White
Aspen ( Populus tremuloides ) stands have historically been considered a barrier to wildfire progression across Canada. However, as the clim… (see more)ate changes and negatively impacts fire weather conditions, the established relationship between aspen, weather, and wildfires may also be changing. We explored this relationship using annual maps of dominant tree species extent and wildfire occurrence for three recent active fire years (2021–2023) within four Canadian forested ecozones (275 Mha), where most interactions between aspen stands and wildfires take place. We compared the proportion of aspen at burned perimeters with the proportion of aspen within the burned perimeters and found that aspen was more than twice as common at fire perimeters (ratio of 2.42). Increasing aspen cover also decreased daily burned area, from a median of 717 ha/day to 646 ha/day when aspen cover increased from less than 10% to more than 25%. Our analysis indicated that the increase in daily burned area following a rise in the fire weather index was reduced when greater aspen cover was present. Additionally, comparison of burn severity in spruce- and pine-dominated stands showed that aspen burned at a significantly lower severity than spruce in the two ecozones where aspen presence is greater. Our results indicate that despite a warming climate and an increase in the number of days conducive to severe fires, aspen continues to function as a barrier to the progression of wildfire and mitigates increases in daily burn area under extreme weather conditions. • Aspen act as a fire barrier: it is twice as common at fire perimeters than inside. • Increasing aspen cover reduces daily burned area. • Greater aspen cover moderates increased burned area caused by extreme fire weather. • Aspen burn severity was lower than spruce and pine where aspen presence was greater. • The difference in fire activity between leaf and leafless aspen is mixed.