Publications

Deploying Geospatial Foundation Models in the Real World: Lessons from WorldCereal

Christina Butsko

Gabriel Tseng

Kristof Van Tricht

Giorgia Milli

David Rolnick

Ruben Cartuyvels

Inbal Becker Reshef

Zoltan Szantoi

Hannah Kerner

The increasing availability of geospatial foundation models has the potential to transform remote sensing applications such as land cover cl… (see more)assification, environmental monitoring, and change detection. Despite promising benchmark results, the deployment of these models in operational settings is challenging and rare. Standardized evaluation tasks often fail to capture real-world complexities relevant for end-user adoption such as data heterogeneity, resource constraints, and application-specific requirements. This paper presents a structured approach to integrate geospatial foundation models into operational mapping systems. Our protocol has three key steps: defining application requirements, adapting the model to domain-specific data and conducting rigorous empirical testing. Using the Presto model in a case study for crop mapping, we demonstrate that fine-tuning a pre-trained model significantly improves performance over conventional supervised methods. Our results highlight the model’s strong spatial and temporal generalization capabilities. Our protocol provides a replicable blueprint for practitioners and lays the groundwork for future research to operationalize foundation models in diverse remote sensing applications. Application of the protocol to the WorldCereal global crop-mapping system showcases the framework’s scalability.

2025-12-01

Proceedings of The TerraBytes {ICML} Workshop: Towards global datasets and models for Earth Observation (published)

doi.org

proceedings.mlr.press

MIMIC-MJX: Neuromechanical Emulation of Animal Behavior

Charles Y. Zhang

Yuanjia Yang

Aidan Sirbu

Elliott T.T. Abe

Emil Wärnberg

Eric J. Leonardis

Diego E. Aldarondo

Adam Lee

Aaditya Prasad

Jason Foat

Kaiwen Bian

Joshua Park

Rusham Bhatt

Hutton Saunders

Akira Nagamori

Ayesha R. Thanawalla

Kee Wui Huang

Fabian Plum

Hendrik K. Beck

Steven W. Flavell … (see 6 more)

David Labonte

Blake A. Richards

Bingni W. Brunton

Eiman Azim

Bence P. Ölveczky

Talmo D. Pereira

The primary output of the nervous system is movement and behavior. While recent advances have democratized pose tracking during complex beha… (see more)vior, kinematic trajectories alone provide only indirect access to the underlying control processes. Here we present MIMIC-MJX, a framework for learning biologically-plausible neural control policies from kinematics. MIMIC-MJX models the generative process of motor control by training neural controllers that learn to actuate biomechanically-realistic body models in physics simulation to reproduce real kinematic trajectories. We demonstrate that our implementation is accurate, fast, data-efficient, and generalizable to diverse animal body models. Policies trained with MIMIC-MJX can be utilized to both analyze neural control strategies and simulate behavioral experiments, illustrating its potential as an integrative modeling framework for neuroscience.

2025-12-01

ArXiv (published)

doi.org

arxiv.org

The Cloud-Based Geospatial Benchmark: Challenges and LLM Evaluation

Jeffrey A. Cardille

Renee Johnston

Simon Ilyushchenko

Johan Kartiwa

Zahra Shamsi

Matthew Abraham

Khashayar Azad

Kainath Ahmed

Emma Bergeron Quick

Nuala Caughie

Noah Jencz

Karen Dyson

Andrea Puzzi Nicolau

Maria Fernanda Lopez-Ornelas

David Saah

Michael Brenner

Subhashini Venugopalan

Sameera S Ponda

2025-12-01

Proceedings of The TerraBytes {ICML} Workshop: Towards global datasets and models for Earth Observation (published)

proceedings.mlr.press

Training neural networks from scratch in a videogame leads to brittle brain encoding

Basile Pinsard

Recent brain-encoding studies using videogame tasks suggest that the training objective of an artificial neural network plays a central role… (see more) in how well the network’s representations align with brain activity. This study investigates the alignment of artificial neural network activations with brain activity elicited by a video game task using models trained from scratch in controlled settings. We specifically compared three model training objectives: reinforcement learning, imitation learning, and a vision task, while accounting for other potential factors which may impact performance such as training data and model architecture. We tested models on brain encoding, i.e. their ability to predict functional magnetic resonance imaging (fMRI) signals acquired while human subjects played different levels of the video game Super Mario Bros. When tested on new playthroughs from the game levels seen at training, the reinforcement learning objective had a small but significant advantage in brain encoding, followed by the imitation learning and vision models. We hypothesized that brain-aligned representations would emerge only in task-competent models, and that the specific brain regions well encoded by a model would depend on the nature of the task it was trained on. While brain encoding did improve during model training, even an untrained model with matching architecture approached the performance of the best models. Contrary to our hypotheses, no model layers or specific training objectives aligned preferentially with specific brain areas. Large performance gaps also persisted in fully trained models across game levels, both those seen during training and entirely novel ones. Overall, even though reinforcement learning presented a small advantage to train brain encoding models for videogame data, all tested brain encoding models exhibited brittle performance with limited generalization both within- and out-of-distribution. Overall, our results suggest that training small artificial models from scratch is not sufficiently reliable, and that incorporating pretrained models such as foundation vision–action models may ultimately be necessary to support robust inferences about brain representations.

2025-12-01

bioRxiv (preprint)

doi.org

Accelerated Inorganic Materials Design with Generative Al Agents

Izumi Takahara

Teruyasu Mizoguchi

Bang Liu

Designing inorganic crystalline materials with tailored properties is critical to technological innovation, yet current generative computati… (see more)onal methods often struggle to efficiently explore desired targets with sufficient interpretability. Here, we present MatAgent, a generative approach for inorganic materials discovery that harnesses the powerful reasoning capabilities of large language models (LLMs). By combining a diffusion-based generative model for crystal structure estimation with a predictive model for property evaluation, MatAgent uses iterative, feedback-driven guidance to steer material exploration precisely toward user-defined targets. Integrated with external cognitive tools-including short-term memory, long-term memory, the periodic table, and a comprehensive materials knowledge base-MatAgent emulates human expert reasoning to vastly expand the accessible compositional space. Our results demonstrate that MatAgent robustly directs exploration toward desired properties while consistently achieving high compositional validity, uniqueness, and material novelty. This framework thus provides a highly interpretable, practical, and versatile AI-driven solution to accelerate the discovery and design of next-generation inorganic materials.

2025-11-30

Cell Reports Physical Science (published)

doi.org

arxiv.org

Biomechanical finite element simulation of the pelvic organs under dynamic loading and validation against experimental data from magnetic resonance imaging.

Camille Lafond

Louise Hohnadel

Thomas Brunel

Nicolas Pirró

Bellemare Marc-Emmanuel

Dominique Chamoret

Sébastien Roth

2025-11-30

Medical Engineering & Physics (published)

doi.org

Building a library of acute traumatic spinal cord injury images across Canada: a retrospective cohort study protocol

Naama Rotem-Kohavi

Suzanne Humphreys

Vanessa K Noonan

Christiana L Cheng

Mathieu Guay-Paquet

Maxime Bouthillier

Jan Valosek

Enamundram Naga Karthik

Emma Lichtenstein

Nick Guenther

Kalum Ost

Naj Attabib

Michael Hardisty

Jetan Badhiwala

Jeremie Larouche

Markian Pahuta

Sean Christie

Michael G Fehlings

Daryl Fourney

Brian K Kwon … (see 6 more)

Jean Marc Mac-Thiong

Jérôme Paquet

Philippe Phan

Christopher Witiw

Julien Cohen-Adad

David W Cadotte

MRI is increasingly recognised as a valuable tool for assessing prognosis and predicting outcomes following traumatic spinal cord injury (SC… (see more)I). Several potential MRI biomarkers have been identified, but efforts are still needed to improve the accuracy and feasibility of these biomarkers in clinical practice. This study aims to build a national Canadian SCI imaging repository for storing and analysing imaging data for SCI, with the goal of improving SCI MRI biomarkers to predict outcomes and inform clinical management. As a substudy of the Rick Hansen SCI Registry (RHSCIR), this retrospective multisite study includes individuals who sustained a traumatic cervical SCI between 2015 and 2021, were previously enrolled in RHSCIR, and had MRI scans acquired within 72 hours of injury and before any surgical intervention. Individuals with a penetrating trauma and/or with any prior spine surgery are excluded. The study principal investigator and research associates, experienced with data curation and with the standardised format and specifications of the Brain Imaging Data Structure standard, guide the site’s curator on the steps to perform image deidentification and curation to create standardised datasets across all sites. These datasets are transferred to a Digital Research Alliance of Canada (‘the Alliance’) server designated for this project and concatenated to form the national Canadian SCI imaging repository (Neurogitea). We are using a semiautomated processing pipeline to quantify lesion morphology, together with additional imaging measures that are manually extracted from the images (for instance, the relative maximal spinal cord compression and the maximum canal compromise). Through linkage to RHSCIR clinical and epidemiological data already available on eligible participants, regression analysis is planned to predict neurological outcomes at discharge, including the American Spinal Injury Association Impairment Scale grade, upper and lower extremity motor and sensory scores. This protocol has been submitted by the participating sites to obtain ethics and institutional approvals prior to the study initiation at each site. All 12 sites across Canada have now obtained ethics and institutional approvals. Study results will be disseminated at local, national and international conferences and by journal publications.

2025-11-30

BMJ Open (published)

doi.org

Effect of Document Packing on the Latent Multi-Hop Reasoning Capabilities of Large Language Models

Gabriele Prato

Shagun Sodhani

Alessandro Sordoni

A. Chandar

The standard practice for training large language models involves packing multiple documents together to optimize computational efficiency. … (see more)However, the impact of this process on the models' capabilities remains largely unexplored. To address this gap, we investigate how different document-packing strategies influence the latent multi-hop reasoning abilities of LLMs. Our findings indicate that packing can improve model performance compared to training on individual documents, at the expense of more compute. To further understand the underlying mechanisms, we conduct an ablation study, identifying key factors that explain the advantages of packing. Ultimately, our research deepens the understanding of LLM training dynamics and provides practical insights for optimizing model development.

2025-11-30

arXiv (published)

doi.org

arxiv.org

FALCON: Few-step Accurate Likelihoods for Continuous Flows

Artem Gazizov

2025-11-30

arXiv (published)

doi.org

arxiv.org

Foundations of Diffusion Models in General State Spaces: A Self-Contained Introduction

Vincent Pauline

Kirill Neklyudov

2025-11-30

arXiv (published)

doi.org

arxiv.org

Genetic and Causal Insights Into White Matter Hyperintensities Across the Brain‐Body Axis

Manpreet Singh

Kimia Shafighi

Flavie E. Detcheverry

Gabrielle Dagasso

Fanta Dabo

Ikrame Housni

Sridar Narayanan

Nils D. Forkert

Sarah A Gagliano Taliun

Danilo Bzdok

AmanPreet Badhwar

White matter hyperintensities (WMHs), visible as bright regions on T2‐weighted FLAIR MRI, are frequent with age and elevated in Alzheimer'… (see more)s disease (AD). Representing axonal damage, demyelination, and edema, WMHs are driven by vascular mechanisms, including endothelial dysfunction and impaired cerebrovascular autoregulation. WMHs also exhibit strong heritability (55–73%), with overlapping genetic pathways shared with AD. Emerging evidence suggests systemic factors across the brain‐body axis influence WMHs, yet these contributions and their genetic overlap with AD remain underexplored. Our study investigated genetic underpinnings specific to WMHs and those shared with AD by assessing partitioned heritability of WMHs and AD across the brain‐body axis with SNP level tissue‐ and cell‐specific annotations; identifying genes associated with WMHs and AD through integration of gene expression data, establishing causal links between SNP‐level findings and imaging‐derived phenotypes (IDPs), particularly structural variations in regional brain volumes. Partitioned heritability was assessed using stratified‐linkage disequilibrium score regression (sLDSC) on GWAS summary statistics ( N = 3 WMH studies; N = 6 AD studies) using human A1) tissue level annotations ( N = 10) and A2) continuous cell‐specific annotations ( N = 64). MAGMA and FUSION analyses highlighted genes associated with WMH and AD for further bioinformatics analysis (using human protein atlas (HPA) and STRING database). MACAW (Vigneshwaran et al, 2024) modeled causal relationships between WMH‐associated SNPs (from FUMA analysis) and IDPs ( N = 172), leveraging directed acyclic graphs to evaluate genetic effects while controlling for confounders (Figure 2). Tissue‐specific analysis revealed significant enrichment of WMH‐associated SNPs in the CNS, liver, cardiovascular system, and kidneys, while AD‐associated SNPs were enriched in the CNS, connective bone, liver, and immune tissues. (Figure 1). Cell‐specific analysis identified vascular endothelial cells as enriched across WMH‐enriched tissues. MAGMA analysis, combined with HPA analysis, corroborated sLDSC tissue‐level findings. MAGMA and FUSION analyses highlighted genes associated with WMHs ( N = 39 and 69) and AD ( N = 291 and 193). MACAW linked WMH‐associated SNP to 172 IDPs, consistently impacting WM hypointensities and regional brain volumes (e.g., left inferior temporal volume). Our findings highlight systemic multi‐tissue contributions (CNS, liver, cardiovascular system, and kidneys) to WMHs, driven by vascular endothelial dysfunction and shared AD genetics, with SNPs across the body also affecting brain imaging derived phenotypes.

2025-11-30

Alzheimer's & Dementia (published)

doi.org

On Global Applicability and Location Transferability of Generative Deep Learning Models for Precipitation Downscaling

Paula Harder

Christian Lessig

Matthew Chantry

Francis Pelletier

David Rolnick

2025-11-30

ArXiv (preprint)

doi.org