Publications

Next-Token Prediction Should be Ambiguity-Sensitive : A Meta-Learing Perspective
NovoMolGen: Rethinking Molecular Language Model Pretraining
Kamran Chitsaz
Roshan Balaji
Nirav Pravinbhai Bhatt
A. Chandar
Task-Informed Meta-Learning for Remote Sensing
Labels in remote sensing datasets - and particularly in agricultural remote sensing datasets - can be extremely spatially imbalanced, with p… (voir plus)lentiful labels in some regions but a sparsity of labels in other regions. When developing algorithms for data-sparse regions, a natural approach is to use transfer learning from data-rich regions. While standard transfer learning approaches typically leverage only direct inputs and outputs, remote sensing data (and geospatial data more generally) are rich in metadata that can inform transfer learning algorithms, such as the spatial coordinates of data-points. We build on previous work exploring the use of meta-learning for remote sensing contexts in data-sparse regions and introduce task-informed meta-learning (TIML), an augmentation to model-agnostic meta-learning which takes advantage of task-specific metadata. We apply TIML to regression and classification tasks in remote sensing for agriculture, and find that TIML outperforms a range of benchmarks in both contexts, across a diversity of model architectures. TIML was developed for remote sensing with the goal of improving the global accuracy (and equity) of machine learning models. However, it can offer benefits to any meta-learning setup with task-specific metadata - we demonstrate this by applying TIML to the Omniglot dataset.
The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset
Gilad Landau
Miran Ozdogan
Gereon Elvers
Francesco Mantegna
Pratik Somaiya
Dulhan Hansaja Jayalath
Luisa Kurth
Teyun Kwon
Brendan Shillingford
Greg Farquhar
Minqi Jiang
Karim Jerbi CoCo Lab
Yorguin José Mantilla Ramos
M. Woolrich
Natalie Voets
Oiwi Parker Jones
The advance of speech decoding from non-invasive brain data holds the potential for profound societal impact. Among its most promising appli… (voir plus)cations is the restoration of communication to paralysed individuals affected by speech deficits such as dysarthria, without the need for high-risk surgical interventions. The ultimate aim of the 2025 PNPL competition is to produce the conditions for an"ImageNet moment"or breakthrough in non-invasive neural decoding, by harnessing the collective power of the machine learning community. To facilitate this vision we present the largest within-subject MEG dataset recorded to date (LibriBrain) together with a user-friendly Python library (pnpl) for easy data access and integration with deep learning frameworks. For the competition we define two foundational tasks (i.e. Speech Detection and Phoneme Classification from brain data), complete with standardised data splits and evaluation metrics, illustrative benchmark models, online tutorial code, a community discussion board, and public leaderboard for submissions. To promote accessibility and participation the competition features a Standard track that emphasises algorithmic innovation, as well as an Extended track that is expected to reward larger-scale computing, accelerating progress toward a non-invasive brain-computer interface for speech.
Torsional-GFN: a conditional conformation generator for small molecules
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Mahmoud Assran
Adrien Bardes
David Fan
Quentin Garrido
Russell Howes
Mojtaba Komeili
Matthew J. Muckley
Ammar Rizvi
Claire Roberts
Sergio Arnaud
Abha Gejji
Ada Martin
Francois Hogan
Daniel Dugas
Piotr Bojanowski
Vasil Khalidov
Patrick Labatut
Francisco Massa … (voir 13 de plus)
Marc Szafraniec
K. Krishnakumar
Ying Li
Xiaodong Ma
A. Chandar
Franziska Meier
Michael G. Rabbat
Fair at Meta
Mila - Québec
AI Institute
Polytechnique Montréal
A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supe… (voir plus)rvised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-predictive architecture, V-JEPA 2, on a video and image dataset comprising over 1 million hours of internet video. V-JEPA 2 achieves strong performance on motion understanding (77.3 top-1 accuracy on Something-Something v2) and state-of-the-art performance on human action anticipation (39.7 recall-at-5 on Epic-Kitchens-100) surpassing previous task-specific models. Additionally, after aligning V-JEPA 2 with a large language model, we demonstrate state-of-the-art performance on multiple video question-answering tasks at the 8 billion parameter scale (e.g., 84.0 on PerceptionTest, 76.9 on TempCompass). Finally, we show how self-supervised learning can be applied to robotic planning tasks by post-training a latent action-conditioned world model, V-JEPA 2-AC, using less than 62 hours of unlabeled robot videos from the Droid dataset. We deploy V-JEPA 2-AC zero-shot on Franka arms in two different labs and enable picking and placing of objects using planning with image goals. Notably, this is achieved without collecting any data from the robots in these environments, and without any task-specific training or reward. This work demonstrates how self-supervised learning from web-scale data and a small amount of robot interaction data can yield a world model capable of planning in the physical world.
Alveolar epithelial cell plasticity and injury memory in human pulmonary fibrosis
Taylor S Adams
Jonas C Schupp
Agshin Balayev
Johad Khoury
Aurelien Justet
Fadi Nikola
Laurens J De Sadeleer
Laurens J De Sadeleer
Juan Cala-Garcia
Marta Zapata-Ortega
Panayiotis V Benos
Panayiotis V Benos
Panayiotis V Benos
John E McDonough
Farida Ahangari
Melanie Königshoff
Robert J Homer
Ivan Rosas
Xiting Yan … (voir 3 de plus)
Bart M Vanaudenaerde
Wim A Wuyts
Naftali Kaminski
Acute and repetitive lung epithelial injury can lead to irreversible and even progressive pulmonary fibrosis; Idiopathic pulmonary fibrosis … (voir plus)(IPF) is a fatal disease and quintessential example of this phenomenon. The composition of epithelial cells in human pulmonary fibrosis – irrespective of disease etiology – is marked by the presence of Aberrant Basaloid cells: an abnormal cell phenotype with pro-fibrotic and senescent features, localized to the surface of fibrotic lesions. Despite their relevance to human pulmonary fibrosis, the exotic molecular profile of Aberrant Basaloid cells has obscured their etiology, preventing insights into how or why these cells emerge with fibrosis. Here we identify cellular intermediaries between Aberrant Basaloid and normal alveolar epithelial cells in human IPF tissue. We track the emergence of Aberrant Basaloid cells from alveolar epithelial cells ex vivo and uncover a role for similar cells in epithelial regeneration under normal conditions. Lastly, we characterize the epigenetic changes that distinguish Aberrant Basaloid cells from their progenitors and identify hallmarks of AP-1 injury memory retention. This study elucidates the phenomenon of maladaptive epithelial plasticity and regeneration in pulmonary fibrosis and re-contextualizes therapeutic strategies for epithelial dysfunction.
Assessing and Learning Alignment of Unimodal Vision and Language Models
How well are unimodal vision and language models aligned? Although prior work have approached answering this question, their assessment meth… (voir plus)ods do not directly translate to how these models are used in practical vision-language tasks. In this paper, we propose a direct assessment method, inspired by linear probing, to assess vision-language alignment. We identify that the degree of alignment of the SSL vision models depends on their SSL training objective, and we find that the clustering quality of SSL representations has a stronger impact on alignment performance than their linear separability. Next, we introduce Swift Alignment of Image and Language (SAIL), a efficient transfer learning framework that aligns pretrained unimodal vision and language models for downstream vision-language tasks. Since SAIL leverages the strengths of pretrained unimodal models, it requires significantly fewer (6%) paired image-text data for the multimodal alignment compared to models like CLIP which are trained from scratch. SAIL training only requires a single A100 GPU, 5 hours of training and can accommodate a batch size up to 32,768. SAIL achieves 73.4% zero-shot accuracy on ImageNet (vs. CLIP's 72.7%) and excels in zero-shot retrieval, complex reasoning, and semantic segmentation. Additionally, SAIL improves the language-compatibility of vision encoders that in turn enhance the performance of multimodal large language models. The entire codebase and model weights are open-source: https://lezhang7.github.io/sail.github.io/
CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics
Verena Rieser
Lisa Anne Hendricks
Sjoerd van Steenkiste
Karolina Stanczak
The increasing ubiquity of text-to-image (T2I) models as tools for visual content generation raises concerns about their ability to accurate… (voir plus)ly represent diverse cultural contexts. In this work, we present the first study to systematically quantify the alignment of T2I models and evaluation metrics with respect to both explicit as well as implicit cultural expectations. To this end, we introduce CulturalFrames, a novel benchmark designed for rigorous human evaluation of cultural representation in visual generations. Spanning 10 countries and 5 socio-cultural domains, CulturalFrames comprises 983 prompts, 3637 corresponding images generated by 4 state-of-the-art T2I models, and over 10k detailed human annotations. We find that T2I models not only fail to meet the more challenging implicit expectations but also the less challenging explicit expectations. Across models and countries, cultural expectations are missed an average of 44% of the time. Among these failures, explicit expectations are missed at a surprisingly high average rate of 68%, while implicit expectation failures are also significant, averaging 49%. Furthermore, we demonstrate that existing T2I evaluation metrics correlate poorly with human judgments of cultural alignment, irrespective of their internal reasoning. Collectively, our findings expose critical gaps, providing actionable directions for developing more culturally informed T2I models and evaluation methodologies.
Did I Faithfully Say What I Thought? Bridging the Gap Between Neural Activity and Self-Explanations in Large Language Models
Jean-Noël Vittaut
Nicolas Chesneau
A. Chandar
Marie-Jeanne Lesot
Large Language Models (LLM) have demonstrated the capability of generating free text self Natural Language Explanation (self-NLE) to justify… (voir plus) their answers. Despite their logical appearance, self-NLE do not necessarily reflect the LLM actual decision-making process, making such explanations unfaithful. While existing methods for measuring self-NLE faithfulness mostly rely on behavioral tests or computational block identification, none of them examines the neural activity underlying the model's reasoning. This work introduces a novel flexible framework for quantitatively measuring the faithfulness of LLM-generated self-NLE by directly comparing the latter with interpretations of the model's internal hidden states. The proposed framework is versatile and provides deep insights into self-NLE faithfulness by establishing a direct connection between self-NLE and model reasoning. This approach advances the understanding of self-NLE faithfulness and provides building blocks for generating more faithful self-NLE.
Fresh in memory: Training-order recency is linearly encoded in language model activations
Dmitrii Krasheninnikov
Richard E. Turner
David M. Krueger
We show that language models' activations linearly encode when information was learned during training. Our setup involves creating a model … (voir plus)with a known training order by sequentially fine-tuning Llama-3.2-1B on six disjoint but otherwise similar datasets about named entities. We find that the average activations of test samples corresponding to the six training datasets encode the training order: when projected into a 2D subspace, these centroids are arranged exactly in the order of training and lie on a straight line. Further, we show that linear probes can accurately (~90%) distinguish "early" vs. "late" entities, generalizing to entities unseen during the probes' own training. The model can also be fine-tuned to explicitly report an unseen entity's training stage (~80% accuracy). Interestingly, the training-order encoding does not seem attributable to simple differences in activation magnitudes, losses, or model confidence. Our paper demonstrates that models are capable of differentiating information by its acquisition time, and carries significant implications for how they might manage conflicting data and respond to knowledge modifications.
Geometry-Aware Preference Learning for 3D Texture Generation
Tianhao Xie
Amir Aghdam
Tiberiu Popa
Recent advances in 3D generative models have achieved impressive results but 3D contents generated by these models may not align with subjec… (voir plus)tive human preferences or task-specific criteria. Moreover, a core challenge in the 3D texture generation domain remains: most existing approaches rely on repeated calls to 2D text-to-image generative models, which lack an inherent understanding of the 3D structure of the input 3D mesh object. To address this, we propose an end-to-end differentiable preference learning framework that back-propagates human preferences, represented by differentiable reward functions, through the entire 3D generative pipeline, making the process inherently geometry-aware. We demonstrate the effectiveness of our framework using four proposed novel geometry-aware reward functions, offering a more controllable and interpretable pathway for high-quality 3D content creation from natural language.