Publications

LLMs for Experiment Design in Scientific Domains: Are We There Yet?
Jason Hartford
Mapping Delayed Canopy Loss and Durable Fire Refugia for the 2020 Wildfires in Washington State Using Multiple Sensors
Anika M. Anderson
Meg A. Krawchuk
Flavie Pelletier
Jeffrey A. Cardille
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Thinking
Sangmin Bae
Yujin Kim
Sungnyun Kim
Jiyoun Ha
Tal Schuster
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Se-Young Yun
Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deploy… (voir plus)ment expensive. Existing efficiency efforts typically target either parameter sharing or adaptive computation, leaving open the question of how to attain both simultaneously. We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer. MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking by dynamically assign recursion depth to tokens, thereby focusing quadratic attention computation only where it is most useful. Further enhancing its efficiency, MoR incorporates a recursion-wise key-value caching mechanism that eliminates redundant memory access across recursion steps by selectively storing only the key-value caches for designated tokens. Across pretraining runs at model scales ranging from 135M to 1.7B parameters, MoR forms a new Pareto frontier: at equal training FLOPs and smaller model sizes, it significantly lowers validation perplexity and improves few-shot accuracy, while delivering higher throughput compared with vanilla and existing recursive baselines. These gains demonstrate that MoR is an effective path towards large-model quality without incurring large-model cost.
Model Parallelism With Subnetwork Data Parallelism
Distributed pre-training of large models at scale often imposes heavy memory demands on individual nodes and incurs significant intra-node c… (voir plus)ommunication costs. We propose a novel alternative approach that reduces the memory requirements by training small, structured subnetworks of the model on separate workers. Unlike pipelining, our method avoids inter-node activation communication and maintains bandwidth requirements that are comparable to or lower than standard data parallel communication schemes based on all-reduce. We evaluate two subnetwork construction strategies guided by the principle of ensuring uniform representation of each parameter across the distributed training setup. Our results show that the stochastic block dropping technique consistently outperforms the width-wise subnetwork construction previously explored in federated learning. We empirically attribute this superior performance to stronger gradient alignment in subnetworks that retain blocks having skip connections. Preliminary experiments highlight the promise of our approach, achieving a
MuLoCo: Muon is a practical inner optimizer for DiLoCo
Next-Token Prediction Should be Ambiguity-Sensitive : A Meta-Learing Perspective
NovoMolGen: Rethinking Molecular Language Model Pretraining
Kamran Chitsaz
Roshan Balaji
Nirav Pravinbhai Bhatt
A. Chandar
Task-Informed Meta-Learning for Remote Sensing
Labels in remote sensing datasets - and particularly in agricultural remote sensing datasets - can be extremely spatially imbalanced, with p… (voir plus)lentiful labels in some regions but a sparsity of labels in other regions. When developing algorithms for data-sparse regions, a natural approach is to use transfer learning from data-rich regions. While standard transfer learning approaches typically leverage only direct inputs and outputs, remote sensing data (and geospatial data more generally) are rich in metadata that can inform transfer learning algorithms, such as the spatial coordinates of data-points. We build on previous work exploring the use of meta-learning for remote sensing contexts in data-sparse regions and introduce task-informed meta-learning (TIML), an augmentation to model-agnostic meta-learning which takes advantage of task-specific metadata. We apply TIML to regression and classification tasks in remote sensing for agriculture, and find that TIML outperforms a range of benchmarks in both contexts, across a diversity of model architectures. TIML was developed for remote sensing with the goal of improving the global accuracy (and equity) of machine learning models. However, it can offer benefits to any meta-learning setup with task-specific metadata - we demonstrate this by applying TIML to the Omniglot dataset.
The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset
Gilad Landau
Miran Ozdogan
Gereon Elvers
Francesco Mantegna
Pratik Somaiya
Dulhan Hansaja Jayalath
Luisa Kurth
Teyun Kwon
Brendan Shillingford
Greg Farquhar
Minqi Jiang
Karim Jerbi CoCo Lab
Yorguin José Mantilla Ramos
M. Woolrich
Natalie Voets
Oiwi Parker Jones
The advance of speech decoding from non-invasive brain data holds the potential for profound societal impact. Among its most promising appli… (voir plus)cations is the restoration of communication to paralysed individuals affected by speech deficits such as dysarthria, without the need for high-risk surgical interventions. The ultimate aim of the 2025 PNPL competition is to produce the conditions for an"ImageNet moment"or breakthrough in non-invasive neural decoding, by harnessing the collective power of the machine learning community. To facilitate this vision we present the largest within-subject MEG dataset recorded to date (LibriBrain) together with a user-friendly Python library (pnpl) for easy data access and integration with deep learning frameworks. For the competition we define two foundational tasks (i.e. Speech Detection and Phoneme Classification from brain data), complete with standardised data splits and evaluation metrics, illustrative benchmark models, online tutorial code, a community discussion board, and public leaderboard for submissions. To promote accessibility and participation the competition features a Standard track that emphasises algorithmic innovation, as well as an Extended track that is expected to reward larger-scale computing, accelerating progress toward a non-invasive brain-computer interface for speech.
Torsional-GFN: a conditional conformation generator for small molecules
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Mahmoud Assran
Adrien Bardes
David Fan
Quentin Garrido
Russell Howes
Mojtaba Komeili
Matthew J. Muckley
Ammar Rizvi
Claire Roberts
Sergio Arnaud
Abha Gejji
Ada Martin
Francois Hogan
Daniel Dugas
Piotr Bojanowski
Vasil Khalidov
Patrick Labatut
Francisco Massa … (voir 13 de plus)
Marc Szafraniec
K. Krishnakumar
Ying Li
Xiaodong Ma
A. Chandar
Franziska Meier
Michael G. Rabbat
Fair at Meta
Mila - Québec
AI Institute
Polytechnique Montréal
A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supe… (voir plus)rvised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-predictive architecture, V-JEPA 2, on a video and image dataset comprising over 1 million hours of internet video. V-JEPA 2 achieves strong performance on motion understanding (77.3 top-1 accuracy on Something-Something v2) and state-of-the-art performance on human action anticipation (39.7 recall-at-5 on Epic-Kitchens-100) surpassing previous task-specific models. Additionally, after aligning V-JEPA 2 with a large language model, we demonstrate state-of-the-art performance on multiple video question-answering tasks at the 8 billion parameter scale (e.g., 84.0 on PerceptionTest, 76.9 on TempCompass). Finally, we show how self-supervised learning can be applied to robotic planning tasks by post-training a latent action-conditioned world model, V-JEPA 2-AC, using less than 62 hours of unlabeled robot videos from the Droid dataset. We deploy V-JEPA 2-AC zero-shot on Franka arms in two different labs and enable picking and placing of objects using planning with image goals. Notably, this is achieved without collecting any data from the robots in these environments, and without any task-specific training or reward. This work demonstrates how self-supervised learning from web-scale data and a small amount of robot interaction data can yield a world model capable of planning in the physical world.
Alveolar epithelial cell plasticity and injury memory in human pulmonary fibrosis
Taylor S Adams
Jonas C Schupp
Agshin Balayev
Johad Khoury
Aurelien Justet
Fadi Nikola
Laurens J De Sadeleer
Laurens J De Sadeleer
Juan Cala-Garcia
Marta Zapata-Ortega
Panayiotis V Benos
Panayiotis V Benos
Panayiotis V Benos
John E McDonough
Farida Ahangari
Melanie Königshoff
Robert J Homer
Ivan Rosas
Xiting Yan … (voir 3 de plus)
Bart M Vanaudenaerde
Wim A Wuyts
Naftali Kaminski
Acute and repetitive lung epithelial injury can lead to irreversible and even progressive pulmonary fibrosis; Idiopathic pulmonary fibrosis … (voir plus)(IPF) is a fatal disease and quintessential example of this phenomenon. The composition of epithelial cells in human pulmonary fibrosis – irrespective of disease etiology – is marked by the presence of Aberrant Basaloid cells: an abnormal cell phenotype with pro-fibrotic and senescent features, localized to the surface of fibrotic lesions. Despite their relevance to human pulmonary fibrosis, the exotic molecular profile of Aberrant Basaloid cells has obscured their etiology, preventing insights into how or why these cells emerge with fibrosis. Here we identify cellular intermediaries between Aberrant Basaloid and normal alveolar epithelial cells in human IPF tissue. We track the emergence of Aberrant Basaloid cells from alveolar epithelial cells ex vivo and uncover a role for similar cells in epithelial regeneration under normal conditions. Lastly, we characterize the epigenetic changes that distinguish Aberrant Basaloid cells from their progenitors and identify hallmarks of AP-1 injury memory retention. This study elucidates the phenomenon of maladaptive epithelial plasticity and regeneration in pulmonary fibrosis and re-contextualizes therapeutic strategies for epithelial dysfunction.