Joelle Pineau

Beyza Ermis

Ahmet Üstün

Julia Kreutzer

Marzieh Fadaee

Tiny Aya redefines what a small multilingual language model can achieve. Trained on 70 languages and refined through region-aware posttraini… (see more)ng, it delivers state-of-the-art in translation quality, strong multilingual understanding, and high-quality target-language generation, all with just 3.35B parameters. The release includes a pretrained foundation model, a globally balanced instruction-tuned variant, and three region-specialized models targeting languages from Africa, South Asia, Europe, Asia-Pacific, and West Asia. This report details the training strategy, data composition, and comprehensive evaluation framework behind Tiny Aya, and presents an alternative scaling path for multilingual AI: one centered on efficiency, balanced performance across languages, and practical deployment.

2026-03-11

arXiv (preprint)

A Message from AI Research Leaders: Join Us in Supporting OpenReview

Andrew Y. Ng

Ruslan Salakhutdinov

Fernando Pereira

2025-12-17

OpenReview (unknown)

openreview.net

Advancing science- and evidence-based AI policy.

Rishi Bommasani

Sanjeev Arora

Jennifer Chayes

Yejin Choi

Mariano-Florentino Cuéllar

Li Fei-Fei

Daniel E. Ho

Dan Jurafsky

Sanmi Koyejo

Hima Lakkaraju

Arvind Narayanan

Alondra Nelson

Emma Pierson

Scott Singer

Gael Varoquaux

Suresh Venkatasubramanian

Ion Stoica

Percy Liang

Dawn Song

2025-07-30

Science (published)

Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation

Deploying reinforcement learning (RL) policies in real-world involves significant challenges, including distribution shifts, safety concerns… (see more), and the impracticality of direct interactions during policy refinement. Existing methods, such as domain randomization (DR) and off-dynamics RL, enhance policy robustness by direct interaction with the target domain, an inherently unsafe practice. We propose Uncertainty-Aware RL (UARL), a novel framework that prioritizes safety during training by addressing Out-Of-Distribution (OOD) detection and policy adaptation without requiring direct interactions in target domain. UARL employs an ensemble of critics to quantify policy uncertainty and incorporates progressive environmental randomization to prepare the policy for diverse real-world conditions. By iteratively refining over high-uncertainty regions of the state space in simulated environments, UARL enhances robust generalization to the target domain without explicitly training on it. We evaluate UARL on MuJoCo benchmarks and a quadrupedal robot, demonstrating its effectiveness in reliable OOD detection, improved performance, and enhanced sample efficiency compared to baselines.

2025-07-07

arXiv (preprint)

A flexible machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition

Marc-André Legault

Jason Hartford

Benoit J. Arsenault

Archer Y. Yang

2025-05-31

American Journal of Human Genetics (published)

Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning

Maxime Wabartha

2024-05-06

International Conference on Learning Representations (Accept (poster))

openreview.net

Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct

Peter Henderson

Jieru Hu

Mona Diab

2024-03-11

Proceedings of the Symposium on Computer Science and Law (published)

Group Fairness in Reinforcement Learning

Harsh Satija

Alessandro Lazaric

Matteo Pirotta

We pose and study the problem of satisfying fairness in the online Reinforcement Learning (RL) setting. We focus on the group notions of fai… (see more)rness, according to which agents belonging to different groups should have similar performance based on some given measure. We consider the setting of maximizing return in an unknown environment (unknown transition and reward function) and show that it is possible to have RL algorithms that learn the best fair policies without violating the fairness requirements at any point in time during the learning process. In the tabular finite-horizon episodic setting, we provide an algorithm that combines the principle of optimism and pessimism under uncertainty to achieve zero fairness violation with arbitrarily high probability while also maintaining sub-linear regret guarantees. For the high-dimensional Deep-RL setting, we present algorithms based on the performance-difference style approximate policy improvement update step and we report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process.

2023-04-27

TMLR (accepted)

openreview.net

Estimating causal effects with optimization-based methods: A review and empirical comparison

Martin Cousineau

Vedat Verter

Susan A. Murphy

2023-01-15

European journal of operational research (published)

Publisher Correction: Advancing ethics review practices in AI research

Madhulika Srikumar

Rebecca Finlay

Grace M. Abuhamad

Carolyn Ashurst

Rosie Campbell

Emily Campbell-Ratcliffe

Hudson Hongo

Sara Rene Jordan

Joseph Lindley

Aviv Ovadya

2022-12-31

Nature Machine Intelligence (published)

Questions Are All You Need to Train a Dense Passage Retriever

Devendra Singh Sachan

Mike Lewis

Dani Yogatama

Luke Zettlemoyer

Manzil Zaheer

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training da… (see more)ta. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents). It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.

2022-12-31

Transactions of the Association for Computational Linguistics (published)

Advancing ethics review practices in AI research

Madhulika Srikumar

Rebecca Finlay

Grace M. Abuhamad

Carolyn Ashurst

Rosie Campbell

Emily Campbell-Ratcliffe

Hudson Hongo

Sara Rene Jordan

Joseph Lindley

Aviv Ovadya

2022-11-30

Nature Machine Intelligence (published)