David Rolnick

Biographie

David Rolnick est professeur adjoint et titulaire d’une chaire en IA Canada-CIFAR à l'École d'informatique de l'Université McGill et membre académique principal de Mila – Institut québécois d’intelligence artificielle. Ses travaux portent sur les applications de l'apprentissage automatique dans la lutte contre le changement climatique. Il est cofondateur et président de Climate Change AI et codirecteur scientifique de Sustainability in the Digital Age. David Rolnick a obtenu un doctorat en mathématiques appliquées du Massachusetts Institute of Technology (MIT). Il a été chercheur postdoctoral en sciences mathématiques à la National Science Foundation (NSF), chercheur diplômé à la NSF et boursier Fulbright. Il a figuré sur la liste des « 35 innovateurs de moins de 35 ans » de la MIT Technology Review en 2021.

Étudiants actuels

Benjamin Akera Binen

Collaborateur·rice alumni - McGill

Collaborateur·rice de recherche - Cambridge University

Co-superviseur⋅e :

Postdoctorat - McGill

Michael Bunsen

Collaborateur·rice de recherche - McGill

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Co-superviseur⋅e :

Yuyan Chen

Doctorat - McGill

Eya Cherif

Collaborateur·rice de recherche - Leipzig University

Othmane Echchabi

Maîtrise recherche - McGill

Collaborateur·rice de recherche

Mohamed Elabbas

Collaborateur·rice de recherche

Jannik Endres

Collaborateur·rice de recherche

Jacopo Ghirri

Visiteur de recherche indépendant - Politecnico di Milano

Visiteur de recherche indépendant

Collaborateur·rice de recherche - Johannes Kepler University

Christina Isaicu Isaicu

Collaborateur·rice de recherche - University of Amsterdam

Gaurav Iyer

Maîtrise recherche - McGill

Doctorat - McGill

Devin Kwok

Doctorat - McGill

Visiteur de recherche indépendant - Université de Montréal

Pierre-Louis Lemaire

Collaborateur·rice de recherche - Polytechnique Montréal

Superviseur⋅e principal⋅e :

Alex Hernández-García

David Mickisch

Collaborateur·rice de recherche

Postdoctorat - McGill

Co-superviseur⋅e :

Lena Podina

Collaborateur·rice de recherche - University of Waterloo

Co-superviseur⋅e :

Marlena Reil

Maîtrise recherche - McGill

Rasha Saha

Maîtrise recherche - McGill

Luca Marie Schmidt

Collaborateur·rice de recherche - University of Tübingen

Visiteur de recherche indépendant - Karlsruhe Institute of Technology

Wietze Suijker

Visiteur de recherche indépendant

Ilija Trajković

Collaborateur·rice de recherche - Karlsruhe Institute of Technology

Doctorat - McGill

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche

Doctorat - McGill

Collaborateur·rice de recherche - Ecole Polytechnique Fédérale de Lausanne (EPFL)

Co-superviseur⋅e :

Loubna Benabbou

Shan Zhao

Collaborateur·rice de recherche - Technical University of Munich

Démocratiser l'accès aux données satellitaires grâce à l'IA

Billets de blogue

diagram illustrating how the AI foundation model for Earth observation, Galileo, works

21 octobre 2025

par

Gabriel Tseng

David Rolnick

Lire l'article

Publications

HVAC-SPICE: Value-Uncertainty In-Context RL with Thompson Sampling for Zero-Shot HVAC Control

Anaïs Berkes

Urban buildings consume 40\% of global energy, yet most rely on inefficient rule-based HVAC systems due to the impracticality of deploying a… (voir plus)dvanced controllers across diverse building stock. In-context reinforcement learning (ICRL) offers promise for rapid deployment without per-building training, but standard supervised learning objectives that maximise likelihood of training actions inherit behaviour-policy bias and provide weak exploration under the distribution shifts common when transferring across buildings and climates. We present SPICE (Sampling Policies In-Context with Ensemble uncertainty), a novel ICRL method specifically designed for zero-shot building control that addresses these fundamental limitations. SPICE introduces two key methodological innovations: (i) a propensity-corrected, return-aware training objective that prioritises high-advantage, high-uncertainty actions to enable improvement beyond suboptimal training demonstrations, and (ii) lightweight value ensembles with randomised priors that provide explicit uncertainty estimates for principled episode-level Thompson sampling. At deployment, SPICE samples one value head per episode and acts greedily, resulting in temporally coherent exploration without test-time gradients or building-specific models. We establish a comprehensive experimental protocol using the HOT dataset to evaluate SPICE across diverse building types and climate zones, focusing on the energy efficiency, occupant comfort, and zero-shot transfer capabilities that are critical for urban-scale deployment.

2025-09-29

NeurIPS.cc/2025/Workshop/UrbanAI (poster)

Graph Dreamer: Temporal Graph World Models for Sample-Efficient and Generalisable Reinforcement Learning

Anaïs Berkes

Donna Vakalis

2025-09-21

NeurIPS.cc/2025/Workshop/WiML (publié)

Identifying birdsong syllables without labelled data

Mélisande Teng

Julien Boussard

Hugo Larochelle

Identifying sequences of syllables within birdsongs is key to tackling a wide array of challenges, including bird individual identification … (voir plus)and better understanding of animal communication and sensory-motor learning. Recently, machine learning approaches have demonstrated great potential to alleviate the need for experts to label long audio recordings by hand. However, they still typically rely on the availability of labelled data for model training, restricting applicability to a few species and datasets. In this work, we build the first fully unsupervised algorithm to decompose birdsong recordings into sequences of syllables. We first detect syllable events, then cluster them to extract templates -- syllable representations -- before performing matching pursuit to decompose the recording as a sequence of syllables. We evaluate our automatic annotations against human labels on a dataset of Bengalese finch songs and find that our unsupervised method achieves high performance. We also demonstrate that our approach can distinguish individual birds within a species through their unique vocal signatures, for both Bengalese finches and another species, the great tit.

2025-09-21

arXiv (prépublication)

arxiv.org

Catalyst GFlowNet for electrocatalyst design: A hydrogen evolution reaction case study

Lena Podina

Alex Hernández-García

Efficient and inexpensive energy storage is essential for accelerating the adoption of renewable energy and ensuring a stable supply, despit… (voir plus)e fluctuations in sources such as wind and solar. Electrocatalysts play a key role in hydrogen energy storage (HES), allowing the energy to be stored as hydrogen. However, the development of affordable and high-performance catalysts for this process remains a significant challenge. We introduce Catalyst GFlowNet, a generative model that leverages machine learning-based predictors of formation and adsorption energy to design crystal surfaces that act as efficient catalysts. We demonstrate the performance of the model through a proof-of-concept application to the hydrogen evolution reaction, a key reaction in HES, for which we successfully identified platinum as the most efficient known catalyst. In future work, we aim to extend this approach to the oxygen evolution reaction, where current optimal catalysts are expensive metal oxides, and open the search space to discover new materials. This generative modeling framework offers a promising pathway for accelerating the search for novel and efficient catalysts.

2025-09-19

AI4Mat @ Neural Information Processing Systems (poster)

Bringing SAM to new heights: Leveraging elevation data for tree crown segmentation from drone imagery

Mélisande Teng

Information on trees at the individual level is crucial for monitoring forest ecosystems and planning forest management. Current monitoring … (voir plus)methods involve ground measurements, requiring extensive cost, time and labor. Advances in drone remote sensing and computer vision offer great potential for mapping individual trees from aerial imagery at broad-scale. Large pre-trained vision models, such as the Segment Anything Model (SAM), represent a particularly compelling choice given limited labeled data. In this work, we compare methods leveraging SAM for the task of automatic tree crown instance segmentation in high resolution drone imagery in three use cases: 1) boreal plantations, 2) temperate forests and 3) tropical forests. We also study the integration of elevation data into models, in the form of Digital Surface Model (DSM) information, which can readily be obtained at no additional cost from RGB drone imagery. We present BalSAM, a model leveraging SAM and DSM information, which shows potential over other methods, particularly in the context of plantations. We find that methods using SAM out-of-the-box do not outperform a custom Mask R-CNN, even with well-designed prompts. However, efficiently tuning SAM end-to-end and integrating DSM information are both promising avenues for tree crown instance segmentation models.

2025-09-17

Neural Information Processing Systems (poster)

Causal Climate Emulation with Bayesian Filtering

Alex Archibald

Yaniv Gurwicz

Peer Nowack

Julien Boussard

Traditional models of climate change use complex systems of coupled equations to simulate physical processes across the Earth system. These … (voir plus)simulations are highly computationally expensive, limiting our predictions of climate change and analyses of its causes and effects. Machine learning has the potential to quickly emulate data from climate models, but current approaches are not able to incorporate physically-based causal relationships. Here, we develop an interpretable climate model emulator based on causal representation learning. We derive a novel approach including a Bayesian filter for stable long-term autoregressive emulation. We demonstrate that our emulator learns accurate climate dynamics, and we show the importance of each one of its components on a realistic synthetic dataset and data from two widely deployed climate models.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

GreenHyperSpectra: A multi-source hyperspectral dataset for global vegetation trait prediction

Eya Cherif

Arthur Ouaknine

Luke A. Brown

Phuong D. Dao

Kyle R. Kovach

Bing Lu

Daniel Mederer

Hannes Feilhauer

Teja Kattenborn

Plant traits such as leaf carbon content and leaf mass are essential variables in the study of biodiversity and climate change. However, con… (voir plus)ventional field sampling cannot feasibly cover trait variation at ecologically meaningful spatial scales. Machine learning represents a valuable solution for plant trait prediction across ecosystems, leveraging hyperspectral data from remote sensing. Nevertheless, trait prediction from hyperspectral data is challenged by label scarcity and substantial domain shifts (\eg across sensors, ecological distributions), requiring robust cross-domain methods. Here, we present GreenHyperSpectra, a pretraining dataset encompassing real-world cross-sensor and cross-ecosystem samples designed to benchmark trait prediction with semi- and self-supervised methods. We adopt an evaluation framework encompassing in-distribution and out-of-distribution scenarios. We successfully leverage GreenHyperSpectra to pretrain label-efficient multi-output regression models that outperform the state-of-the-art supervised baseline. Our empirical analyses demonstrate substantial improvements in learning spectral representations for trait prediction, establishing a comprehensive methodological framework to catalyze research at the intersection of representation learning and plant functional traits assessment. All code and data are available at: https://github.com/echerif18/HyspectraSSL.

2025-09-17

NeurIPS.cc/2025/Datasets_and_Benchmarks_Track (poster)

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Yuyan Chen

Nico Lang

B. Christian Schmidt

Aditya Jain

Yves Basset

Sara Beery

Maxim Larrivée

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are … (voir plus)changing. Indeed, some 90% of Earth's species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training -- the problem of open-set recognition (OSR) -- limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

2025-09-17

NeurIPS.cc/2025/Datasets_and_Benchmarks_Track (spotlight)

CISO: Species Distribution Modeling Conditioned on Incomplete Species Observations

Hager Radi Abdelwahed

Mélisande Teng

Robin Zbinden

Laura Pollock

Hugo Larochelle

Devis Tuia

Species distribution models (SDMs) are widely used to predict species' geographic distributions, serving as critical tools for ecological re… (voir plus)search and conservation planning. Typically, SDMs relate species occurrences to environmental variables representing abiotic factors, such as temperature, precipitation, and soil properties. However, species distributions are also strongly influenced by biotic interactions with other species, which are often overlooked. While some methods partially address this limitation by incorporating biotic interactions, they often assume symmetrical pairwise relationships between species and require consistent co-occurrence data. In practice, species observations are sparse, and the availability of information about the presence or absence of other species varies significantly across locations. To address these challenges, we propose CISO, a deep learning-based method for species distribution modeling Conditioned on Incomplete Species Observations. CISO enables predictions to be conditioned on a flexible number of species observations alongside environmental variables, accommodating the variability and incompleteness of available biotic data. We demonstrate our approach using three datasets representing different species groups: sPlotOpen for plants, SatBird for birds, and a new dataset, SatButterfly, for butterflies. Our results show that including partial biotic information improves predictive performance on spatially separate test sets. When conditioned on a subset of species within the same dataset, CISO outperforms alternative methods in predicting the distribution of the remaining species. Furthermore, we show that combining observations from multiple datasets can improve performance. CISO is a promising ecological tool, capable of incorporating incomplete biotic information and identifying potential interactions between species from disparate taxa.

2025-08-07

ArXiv (prépublication)

arxiv.org

RainShift: A Benchmark for Precipitation Downscaling Across Geographies

Paula Harder

Luca Schmidt

Francis Pelletier

Nicole Ludwig

Matthew Chantry

Christian Lessig

Alex Hernández-García

Earth System Models (ESM) are our main tool for projecting the impacts of climate change. However, running these models at sufficient resolu… (voir plus)tion for local-scale risk-assessments is not computationally feasible. Deep learning-based super-resolution models offer a promising solution to downscale ESM outputs to higher resolutions by learning from data. Yet, due to regional variations in climatic processes, these models typically require retraining for each geographical area-demanding high-resolution observational data, which is unevenly available across the globe. This highlights the need to assess how well these models generalize across geographic regions. To address this, we introduce RainShift, a dataset and benchmark for evaluating downscaling under geographic distribution shifts. We evaluate state-of-the-art downscaling approaches including GANs and diffusion models in generalizing across data gaps between the Global North and Global South. Our findings reveal substantial performance drops in out-of-distribution regions, depending on model and geographic area. While expanding the training domain generally improves generalization, it is insufficient to overcome shifts between geographically distinct regions. We show that addressing these shifts through, for example, data alignment can improve spatial generalization. Our work advances the global applicability of downscaling methods and represents a step toward reducing inequities in access to high-resolution climate information.

2025-07-06

ArXiv (prépublication)

arxiv.org

HVAC-GRACE: Transferable Building Control via Heterogeneous Graph Neural Network Policies

Anaïs Berkes

Donna Vakalis

Buildings consume 40% of global energy, with HVAC systems responsible for up to half of that demand. As energy use grows, optimizing HVAC ef… (voir plus)ficiency is critical to meeting climate goals. While reinforcement learning (RL) offers a promising alternative to rule-based control, real-world adoption is limited by poor sample efficiency and generalisation. We introduce HVAC-GRACE, a graph-based RL framework that models buildings as heterogeneous graphs and integrates spatial message passing directly into temporal GRU gates. This enables each zone to learn control actions informed by both its own history and its structural context. Our architecture supports zero-shot transfer by learning topology-agnostic functions—but initial experiments reveal that this benefit depends on sufficient conditioned zone connectivity to maintain gradient flow. These findings highlight both the promise and the architectural requirements of scalable, transferable RL for building control

2025-06-30

ICML.cc/2025/Workshop/CO-BUILD (poster)

Task-Informed Meta-Learning for Remote Sensing

Gabriel Tseng

Hannah Kerner

Labels in remote sensing datasets - and particularly in agricultural remote sensing datasets - can be extremely spatially imbalanced, with p… (voir plus)lentiful labels in some regions but a sparsity of labels in other regions. When developing algorithms for data-sparse regions, a natural approach is to use transfer learning from data-rich regions. While standard transfer learning approaches typically leverage only direct inputs and outputs, remote sensing data (and geospatial data more generally) are rich in metadata that can inform transfer learning algorithms, such as the spatial coordinates of data-points. We build on previous work exploring the use of meta-learning for remote sensing contexts in data-sparse regions and introduce task-informed meta-learning (TIML), an augmentation to model-agnostic meta-learning which takes advantage of task-specific metadata. We apply TIML to regression and classification tasks in remote sensing for agriculture, and find that TIML outperforms a range of benchmarks in both contexts, across a diversity of model architectures. TIML was developed for remote sensing with the goal of improving the global accuracy (and equity) of machine learning models. However, it can offer benefits to any meta-learning setup with task-specific metadata - we demonstrate this by applying TIML to the Omniglot dataset.

2025-06-10

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (publié)