Publications

LLMs for Experiment Design in Scientific Domains: Are We There Yet?

Rushil Gupta

Jason Hartford

Bang Liu

2025-06-10

ICML.cc/2025/Workshop/GenBio (poster)

openreview.net

Mapping Delayed Canopy Loss and Durable Fire Refugia for the 2020 Wildfires in Washington State Using Multiple Sensors

Anika M. Anderson

Meg A. Krawchuk

Flavie Pelletier

Jeffrey A. Cardille

2025-06-10

Fire (published)

doi.org

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Thinking

Sangmin Bae

Yujin Kim

Reza Bayat

Sungnyun Kim

Jiyoun Ha

Tal Schuster

Adam Fisch

Hrayr Harutyunyan

Ziwei Ji

Aaron Courville

Se-Young Yun

Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deploy… (see more)ment expensive. Existing efficiency efforts typically target either parameter sharing or adaptive computation, leaving open the question of how to attain both simultaneously. We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer. MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking by dynamically assign recursion depth to tokens, thereby focusing quadratic attention computation only where it is most useful. Further enhancing its efficiency, MoR incorporates a recursion-wise key-value caching mechanism that eliminates redundant memory access across recursion steps by selectively storing only the key-value caches for designated tokens. Across pretraining runs at model scales ranging from 135M to 1.7B parameters, MoR forms a new Pareto frontier: at equal training FLOPs and smaller model sizes, it significantly lowers validation perplexity and improves few-shot accuracy, while delivering higher throughput compared with vanilla and existing recursive baselines. These gains demonstrate that MoR is an effective path towards large-model quality without incurring large-model cost.

2025-06-10

ICML.cc/2025/Workshop/ES-FoMo-III (published)

openreview.net

Model Parallelism With Subnetwork Data Parallelism

Distributed pre-training of large models at scale often imposes heavy memory demands on individual nodes and incurs significant intra-node c… (see more)ommunication costs. We propose a novel alternative approach that reduces the memory requirements by training small, structured subnetworks of the model on separate workers. Unlike pipelining, our method avoids inter-node activation communication and maintains bandwidth requirements that are comparable to or lower than standard data parallel communication schemes based on all-reduce. We evaluate two subnetwork construction strategies guided by the principle of ensuring uniform representation of each parameter across the distributed training setup. Our results show that the stochastic block dropping technique consistently outperforms the width-wise subnetwork construction previously explored in federated learning. We empirically attribute this superior performance to stronger gradient alignment in subnetworks that retain blocks having skip connections. Preliminary experiments highlight the promise of our approach, achieving a

2025-06-10

ICML.cc/2025/Workshop/ES-FoMo-III (published)

doi.org

openreview.net

MuLoCo: Muon is a practical inner optimizer for DiLoCo

Benjamin Therien

Xiaolong Huang

Irina Rish

Eugene Belilovsky

2025-06-10

ICML.cc/2025/Workshop/ES-FoMo-III (published)

doi.org

openreview.net

Next-Token Prediction Should be Ambiguity-Sensitive : A Meta-Learing Perspective

2025-06-10

ICML.cc/2025/Workshop/ES-FoMo-III (published)

openreview.net

NovoMolGen: Rethinking Molecular Language Model Pretraining

Kamran Chitsaz

Roshan Balaji

Quentin Fournier

Nirav Pravinbhai Bhatt

A. Chandar

2025-06-10

ICML.cc/2025/Workshop/GenBio (poster)

doi.org

openreview.net

Task-Informed Meta-Learning for Remote Sensing

Gabriel Tseng

Hannah Kerner

David Rolnick

Labels in remote sensing datasets - and particularly in agricultural remote sensing datasets - can be extremely spatially imbalanced, with p… (see more)lentiful labels in some regions but a sparsity of labels in other regions. When developing algorithms for data-sparse regions, a natural approach is to use transfer learning from data-rich regions. While standard transfer learning approaches typically leverage only direct inputs and outputs, remote sensing data (and geospatial data more generally) are rich in metadata that can inform transfer learning algorithms, such as the spatial coordinates of data-points. We build on previous work exploring the use of meta-learning for remote sensing contexts in data-sparse regions and introduce task-informed meta-learning (TIML), an augmentation to model-agnostic meta-learning which takes advantage of task-specific metadata. We apply TIML to regression and classification tasks in remote sensing for agriculture, and find that TIML outperforms a range of benchmarks in both contexts, across a diversity of model architectures. TIML was developed for remote sensing with the goal of improving the global accuracy (and equity) of machine learning models. However, it can offer benefits to any meta-learning setup with task-specific metadata - we demonstrate this by applying TIML to the Omniglot dataset.

2025-06-10

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (published)

doi.org

The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset

Gilad Landau

Miran Ozdogan

Gereon Elvers

Francesco Mantegna

Pratik Somaiya

Dulhan Hansaja Jayalath

Luisa Kurth

Teyun Kwon

Brendan Shillingford

Greg Farquhar

Minqi Jiang

Karim Jerbi CoCo Lab

Hamza Abdelhedi

Yorguin José Mantilla Ramos

Caglar Gulçehre

M. Woolrich

Natalie Voets

Oiwi Parker Jones

The advance of speech decoding from non-invasive brain data holds the potential for profound societal impact. Among its most promising appli… (see more)cations is the restoration of communication to paralysed individuals affected by speech deficits such as dysarthria, without the need for high-risk surgical interventions. The ultimate aim of the 2025 PNPL competition is to produce the conditions for an"ImageNet moment"or breakthrough in non-invasive neural decoding, by harnessing the collective power of the machine learning community. To facilitate this vision we present the largest within-subject MEG dataset recorded to date (LibriBrain) together with a user-friendly Python library (pnpl) for easy data access and integration with deep learning frameworks. For the competition we define two foundational tasks (i.e. Speech Detection and Phoneme Classification from brain data), complete with standardised data splits and evaluation metrics, illustrative benchmark models, online tutorial code, a community discussion board, and public leaderboard for submissions. To promote accessibility and participation the competition features a Standard track that emphasises algorithmic innovation, as well as an Extended track that is expected to reward larger-scale computing, accelerating progress toward a non-invasive brain-computer interface for speech.

2025-06-10

ArXiv (preprint)

doi.org

arxiv.org

Torsional-GFN: a conditional conformation generator for small molecules

Lena Nehale Ezzine

Alex Hernández-García

2025-06-10

ICML.cc/2025/Workshop/GenBio (poster)

doi.org

openreview.net

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mahmoud Assran

Adrien Bardes

David Fan

Quentin Garrido

Russell Howes

Mojtaba Komeili

Matthew J. Muckley

Ammar Rizvi

Claire Roberts

Koustuv Sinha

Artem Zholus

Sergio Arnaud

Abha Gejji

Ada Martin

Francois Hogan

Daniel Dugas

Piotr Bojanowski

Vasil Khalidov

Patrick Labatut

Francisco Massa … (see 13 more)

Marc Szafraniec

K. Krishnakumar

Ying Li

Xiaodong Ma

A. Chandar

Franziska Meier

Yann Lecun

Michael G. Rabbat

Nicolas Ballas

Fair at Meta

Mila - Québec

AI Institute

Polytechnique Montréal

A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supe… (see more)rvised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-predictive architecture, V-JEPA 2, on a video and image dataset comprising over 1 million hours of internet video. V-JEPA 2 achieves strong performance on motion understanding (77.3 top-1 accuracy on Something-Something v2) and state-of-the-art performance on human action anticipation (39.7 recall-at-5 on Epic-Kitchens-100) surpassing previous task-specific models. Additionally, after aligning V-JEPA 2 with a large language model, we demonstrate state-of-the-art performance on multiple video question-answering tasks at the 8 billion parameter scale (e.g., 84.0 on PerceptionTest, 76.9 on TempCompass). Finally, we show how self-supervised learning can be applied to robotic planning tasks by post-training a latent action-conditioned world model, V-JEPA 2-AC, using less than 62 hours of unlabeled robot videos from the Droid dataset. We deploy V-JEPA 2-AC zero-shot on Franka arms in two different labs and enable picking and placing of objects using planning with image goals. Notably, this is achieved without collecting any data from the robots in these environments, and without any task-specific training or reward. This work demonstrates how self-supervised learning from web-scale data and a small amount of robot interaction data can yield a world model capable of planning in the physical world.

2025-06-10

ArXiv (preprint)

doi.org

arxiv.org

Alveolar epithelial cell plasticity and injury memory in human pulmonary fibrosis

Taylor S Adams

Jonas C Schupp

Agshin Balayev

Johad Khoury

Aurelien Justet

Fadi Nikola

Laurens J De Sadeleer

Juan Cala-Garcia

Marta Zapata-Ortega

Panayiotis V Benos

John E McDonough

Farida Ahangari

Melanie Königshoff

Jun Ding

Robert J Homer

Ivan Rosas

Xiting Yan … (see 3 more)

Bart M Vanaudenaerde

Wim A Wuyts

Naftali Kaminski

Acute and repetitive lung epithelial injury can lead to irreversible and even progressive pulmonary fibrosis; Idiopathic pulmonary fibrosis … (see more)(IPF) is a fatal disease and quintessential example of this phenomenon. The composition of epithelial cells in human pulmonary fibrosis – irrespective of disease etiology – is marked by the presence of Aberrant Basaloid cells: an abnormal cell phenotype with pro-fibrotic and senescent features, localized to the surface of fibrotic lesions. Despite their relevance to human pulmonary fibrosis, the exotic molecular profile of Aberrant Basaloid cells has obscured their etiology, preventing insights into how or why these cells emerge with fibrosis. Here we identify cellular intermediaries between Aberrant Basaloid and normal alveolar epithelial cells in human IPF tissue. We track the emergence of Aberrant Basaloid cells from alveolar epithelial cells ex vivo and uncover a role for similar cells in epithelial regeneration under normal conditions. Lastly, we characterize the epigenetic changes that distinguish Aberrant Basaloid cells from their progenitors and identify hallmarks of AP-1 injury memory retention. This study elucidates the phenomenon of maladaptive epithelial plasticity and regeneration in pulmonary fibrosis and re-contextualizes therapeutic strategies for epithelial dysfunction.

2025-06-09

bioRxiv (preprint)

doi.org

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications