Portrait de David Rolnick

David Rolnick

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, McGill University, École d'informatique
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Théorie de l'apprentissage automatique

Biographie

David Rolnick est professeur adjoint et titulaire d’une chaire en IA Canada-CIFAR à l'École d'informatique de l'Université McGill et membre académique principal de Mila – Institut québécois d’intelligence artificielle. Ses travaux portent sur les applications de l'apprentissage automatique dans la lutte contre le changement climatique. Il est cofondateur et président de Climate Change AI et codirecteur scientifique de Sustainability in the Digital Age. David Rolnick a obtenu un doctorat en mathématiques appliquées du Massachusetts Institute of Technology (MIT). Il a été chercheur postdoctoral en sciences mathématiques à la National Science Foundation (NSF), chercheur diplômé à la NSF et boursier Fulbright. Il a figuré sur la liste des « 35 innovateurs de moins de 35 ans » de la MIT Technology Review en 2021.

Étudiants actuels

Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - UdeM
Collaborateur·rice de recherche - The University of Dresden, Helmholtz Centre for Environmental Research Leipzig
Collaborateur·rice de recherche
Collaborateur·rice de recherche
Collaborateur·rice de recherche - National Observatory of Athens
Collaborateur·rice de recherche
Collaborateur·rice de recherche - McGill
Collaborateur·rice de recherche
Collaborateur·rice de recherche - N/A
Co-superviseur⋅e :
Maîtrise recherche - McGill
Stagiaire de recherche - Leipzig University
Collaborateur·rice de recherche - Université Paris-Saclay
Collaborateur·rice de recherche
Collaborateur·rice de recherche
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Collaborateur·rice de recherche - Johannes Kepler University
Maîtrise recherche - McGill
Collaborateur·rice de recherche - University of Waterloo
Collaborateur·rice de recherche
Stagiaire de recherche - UdeM
Postdoctorat - McGill
Co-superviseur⋅e :
Doctorat - University of Waterloo
Co-superviseur⋅e :
Doctorat - UdeM
Collaborateur·rice de recherche
Maîtrise recherche - McGill
Collaborateur·rice de recherche - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Co-superviseur⋅e :
Collaborateur·rice de recherche - Karlsruhe Institute of Technology
Doctorat - McGill
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche
Doctorat - McGill
Collaborateur·rice alumni - McGill
Collaborateur·rice de recherche

Publications

Dataset Difficulty and the Role of Inductive Bias
Devin Kwok
Nikhil Anand
Jonathan Frankle
Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examp… (voir plus)les within a dataset. These methods, which we call"example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. To determine how example rankings vary due to these random and controlled effects, we systematically compare different formulations of scores over a range of runs and model architectures. We find that scores largely share the following traits: they are noisy over individual runs of a model, strongly correlated with a single notion of difficulty, and reveal examples that range from being highly sensitive to insensitive to the inductive biases of certain model architectures. Drawing from statistical genetics, we develop a simple method for fingerprinting model architectures using a few sensitive examples. These findings guide practitioners in maximizing the consistency of their scores (e.g. by choosing appropriate scoring methods, number of runs, and subsets of examples), and establishes comprehensive baselines for evaluating scores in the future.
Application-Driven Innovation in Machine Learning
Alan Aspuru-Guzik
Sara Beery
Bistra Dilkina
Priya L. Donti
Marzyeh Ghassemi
Hannah Kerner
Claire Monteleoni
Esther Rolf
Milind Tambe
Adam White
As applications of machine learning proliferate, innovative algorithms inspired by specific real-world challenges have become increasingly i… (voir plus)mportant. Such work offers the potential for significant impact not merely in domains of application but also in machine learning itself. In this paper, we describe the paradigm of application-driven research in machine learning, contrasting it with the more standard paradigm of methods-driven research. We illustrate the benefits of application-driven machine learning and how this approach can productively synergize with methods-driven work. Despite these benefits, we find that reviewing, hiring, and teaching practices in machine learning often hold back application-driven innovation. We outline how these processes may be improved.
PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design
Alexandre AGM Duval
Victor Schmidt
Santiago Miret
Alex Hernandez-Garcia
Simultaneous linear connectivity of neural networks modulo permutation
Ekansh Sharma
Devin Kwok
Tom Denton
Daniel M. Roy
A landmark environmental law looks ahead
Robert L. Fischman
J. B. Ruhl
Brenna R. Forester
Tanya M. Lama
Marty Kardos
Grethel Aguilar Rojas
Nicholas A. Robinson
Patrick D. Shirey
Gary A. Lamberti
Amy W. Ando
Stephen Palumbi
Michael Wara
Mark W. Schwartz
Matthew A. Williamson
Tanya Berger-Wolf
Sara Beery
Justin Kitzes
David Thau
Devis Tuia … (voir 8 de plus)
Daniel Rubenstein
Caleb R. Hickman
Julie Thorstenson
Gregory E. Kaebnick
James P. Collins
Athmeya Jayaram
Thomas Deleuil
Ying Zhao
FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models
Nikolaos Ioannis Bountos
Arthur Ouaknine
Towards Causal Representations of Climate Model Data
Julien Boussard
Chandni Nagda
Julia Kaltenborn
Charlotte Emilie Elektra Lange
Philippe Brouillard
Yaniv Gurwicz
Peer Nowack
Climate models, such as Earth system models (ESMs), are crucial for simulating future climate change based on projected Shared Socioeconomic… (voir plus) Pathways (SSP) greenhouse gas emissions scenarios. While ESMs are sophisticated and invaluable, machine learning-based emulators trained on existing simulation data can project additional climate scenarios much faster and are computationally efficient. However, they often lack generalizability and interpretability. This work delves into the potential of causal representation learning, specifically the \emph{Causal Discovery with Single-parent Decoding} (CDSD) method, which could render climate model emulation efficient \textit{and} interpretable. We evaluate CDSD on multiple climate datasets, focusing on emissions, temperature, and precipitation. Our findings shed light on the challenges, limitations, and promise of using CDSD as a stepping stone towards more interpretable and robust climate model emulation.
SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data
Mélisande Teng
Amna Elmustafa
Benjamin Akera
Hager Radi
Biodiversity is declining at an unprecedented rate, impacting ecosystem services necessary to ensure food, water, and human health and well-… (voir plus)being. Understanding the distribution of species and their habitats is crucial for conservation policy planning. However, traditional methods in ecology for species distribution models (SDMs) generally focus either on narrow sets of species or narrow geographical areas and there remain significant knowledge gaps about the distribution of species. A major reason for this is the limited availability of data traditionally used, due to the prohibitive amount of effort and expertise required for traditional field monitoring. The wide availability of remote sensing data and the growing adoption of citizen science tools to collect species observations data at low cost offer an opportunity for improving biodiversity monitoring and enabling the modelling of complex ecosystems. We introduce a novel task for mapping bird species to their habitats by predicting species encounter rates from satellite images, and present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird, considering summer (breeding) and winter seasons. We also provide a dataset in Kenya representing low-data regimes. We additionally provide environmental data and species range maps for each location. We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks. SatBird opens up possibilities for scalably modelling properties of ecosystems worldwide.
OpenForest: A data catalogue for machine learning in forest monitoring
Arthur Ouaknine
Teja Kattenborn
Etienne Lalibert'e
On the importance of catalyst-adsorbate 3D interactions for relaxed energy predictions
Alvaro Carbonero
Alexandre AGM Duval
Victor Schmidt
Santiago Miret
Alex Hernandez-Garcia
The use of machine learning for material property prediction and discovery has traditionally centered on graph neural networks that incorpor… (voir plus)ate the geometric configuration of all atoms. However, in practice not all this information may be readily available, e.g.~when evaluating the potentially unknown binding of adsorbates to catalyst. In this paper, we investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate with respect to the electro-catalyst. We consider SchNet, DimeNet++ and FAENet as base architectures and measure the impact of four modifications on model performance: removing edges in the input graph, pooling independent representations, not sharing the backbone weights and using an attention mechanism to propagate non-geometric relative information. We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE. Our work suggests future research directions in accelerated materials discovery where information on reactant configurations can be reduced or altogether omitted.
ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning
Julia Kaltenborn
Charlotte Emilie Elektra Lange
Venkatesh Ramesh
Philippe Brouillard
Yaniv Gurwicz
Chandni Nagda
Jakob Runge
Peer Nowack
Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) c… (voir plus)ommunity has taken an increased interest in supporting climate scientists’ efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, both the climate science and ML communities have suggested that to address those tasks at scale, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives. In addition, we provide a modular dataset pipeline for retrieving and preprocessing additional climate models and scenarios. We showcase the potential of our dataset by using it as a benchmark for ML-based climate model emulation. We gain new insights about the performance and generalization capabilities of the different ML models by analyzing their performance across different climate models. Furthermore, the dataset can be used to train an ML emulator on several climate models instead of just one. Such a “super-emulator” can quickly project new climate change scenarios, complementing existing scenarios already provided to policymakers. We believe ClimateSet will create the basis needed for the ML community to tackle climate-related tasks at scale.
SatBird: a Dataset for Bird Species Distribution Modeling using Remote Sensing and Citizen Science Data
Mélisande Teng
Amna Elmustafa
Benjamin Akera
Hager Radi