Portrait of David Scott Krueger

David Scott Krueger

Core Academic Member
Assistant professor, Université de Montréal, Department of Computer Science and Operations Research (DIRO)
Research Topics
Deep Learning
Representation Learning

Biography

David Krueger is an Assistant Professor in Robust, Reasoning and Responsible AI in the Department of Computer Science and Operations Research (DIRO) at University of Montreal, and a Core Academic Member at Mila - Quebec Artificial Intelligence Institute, UC Berkeley's Center for Human-Compatible AI (CHAI), and the Center for the Study of Existential Risk (CSER). His work focuses on reducing the risk of human extinction from artificial intelligence (AI x-risk) through technical research as well as education, outreach, governance and advocacy.

His research spans many areas of Deep Learning, AI Alignment, AI Safety and AI Ethics, including alignment failure modes, algorithmic manipulation, interpretability, robustness, and understanding how AI systems learn and generalize. He has been featured in media outlets including ITV's Good Morning Britain, Al Jazeera's Inside Story, France 24, New Scientist and the Associated Press.

David completed his graduate studies at the University of Montreal and Mila - Quebec Artificial Intelligence Institute, working with Yoshua Bengio, Roland Memisevic, and Aaron Courville.

Current Students

PhD - Université de Montréal
Principal supervisor :
Collaborating researcher

Publications

Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
Shoaib Ahmed Siddiqui
Nitarshan Rajkumar
Sara Hooker
Modern machine learning research relies on relatively few carefully curated datasets. Even in these datasets, and typically in `untidy' or r… (see more)aw data, practitioners are faced with significant issues of data quality and diversity which can be prohibitively labor intensive to address. Existing methods for dealing with these challenges tend to make strong assumptions about the particular issues at play, and often require a priori knowledge or metadata such as domain labels. Our work is orthogonal to these methods: we instead focus on providing a unified and efficient framework for Metadata Archaeology -- uncovering and inferring metadata of examples in a dataset. We curate different subsets of data that might exist in a dataset (e.g. mislabeled, atypical, or out-of-distribution examples) using simple transformations, and leverage differences in learning dynamics between these probe suites to infer metadata of interest. Our method is on par with far more sophisticated mitigation methods across different tasks: identifying and correcting mislabeled examples, classifying minority-group samples, prioritizing points relevant for training and enabling scalable human auditing of relevant examples.
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J Bigelow
Robert P. Dick
Hidenori Tanaka
We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved … (see more)via training on a dataset are connected via simple paths of low loss. Specifically, we ask the following question: are minimizers that rely on different mechanisms for making their predictions connected via simple paths of low loss? We provide a definition of mechanistic similarity as shared invariances to input transformations and demonstrate that lack of linear connectivity between two models implies they use dissimilar mechanisms for making their predictions. Relevant to practice, this result helps us demonstrate that naive fine-tuning on a downstream dataset can fail to alter a model's mechanisms, e.g., fine-tuning can fail to eliminate a model's reliance on spurious attributes. Our analysis also motivates a method for targeted alteration of a model's mechanisms, named connectivity-based fine-tuning (CBFT), which we analyze using several synthetic datasets for the task of reducing a model's reliance on spurious attributes.
Noisy Pairing and Partial Supervision for Stylized Opinion Summarization
Reinald Kim
Mirella Lapata. 2020
Un-611
Maxinder S. Kan-620
Asja Fischer
Somnath Basu
Roy Chowdhury
Chao Zhao
Tanya Goyal
Junyi Jiacheng Xu
Jessy Li
Ivor W. Tsang
James T. Kwok
Neil Houlsby
Andrei Giurgiu
Stanisław Jastrzębski … (see 22 more)
Bruna Morrone
Quentin de Laroussilhe
Mona Gesmundo
Attariyan Sylvain
Gelly
Thomas Wolf
Lysandre Debut
Julien Victor Sanh
Clement Chaumond
Anthony Delangue
Pier-339 Moi
Tim ric Cistac
R´emi Rault
Morgan Louf
Funtow-900 Joe
Sam Davison
Patrick Shleifer
Von Platen
Clara Ma
Yacine Jernite
Julien Plu
Canwen Xu
Opinion summarization research has primar-001 ily focused on generating summaries reflect-002 ing important opinions from customer reviews 0… (see more)03 without paying much attention to the writing 004 style. In this paper, we propose the stylized 005 opinion summarization task, which aims to 006 generate a summary of customer reviews in 007 the desired (e.g., professional) writing style. 008 To tackle the difficulty in collecting customer 009 and professional review pairs, we develop a 010 non-parallel training framework, Noisy Pair-011 ing and Partial Supervision ( NAPA ), which 012 trains a stylized opinion summarization sys-013 tem from non-parallel customer and profes-014 sional review sets. We create a benchmark P RO - 015 S UM by collecting customer and professional 016 reviews from Yelp and Michelin. Experimental 017 results on P RO S UM and FewSum demonstrate 018 that our non-parallel training framework con-019 sistently improves both automatic and human 020 evaluations, successfully building a stylized 021 opinion summarization model that can gener-022 ate professionally-written summaries from cus-023 tomer reviews. 024
On The Fragility of Learned Reward Functions
Lev E McKinney
Yawen Duan
Adam Gleave
Reward functions are notoriously difficult to specify, especially for tasks with complex goals. Reward learning approaches attempt to infer … (see more)reward functions from human feedback and preferences. Prior works on reward learning have mainly focused on the performance of policies trained alongside the reward function. This practice, however, may fail to detect learned rewards that are not capable of training new policies from scratch and thus do not capture the intended behavior. Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning. We demonstrate with experiments in tabular and continuous control environments that the severity of relearning failures can be sensitive to changes in reward model design and the trajectory dataset composition. Based on our findings, we emphasize the need for more retraining-based evaluations in the literature.
Training Equilibria in Reinforcement Learning
Lauro Langosco
Adam Gleave
In partially observable environments, reinforcement learning algorithms such as policy gradient and Q-learning may have multiple equilibria-… (see more)--policies that are stable under further training---and can converge to equilibria that are strictly suboptimal. Prior work blames insufficient exploration, but suboptimal equilibria can arise despite full exploration and other favorable circumstances like a flexible policy parametrization. We show theoretically that the core problem is that in partially observed environments, an agent's past actions induce a distribution on hidden states. Equipping the policy with memory helps it model the hidden state and leads to convergence to a higher reward equilibrium, \emph{even when there exists a memoryless optimal policy}. Experiments show that policies with insufficient memory tend to learn to use the environment as auxiliary memory, and parameter noise helps policies escape suboptimal equilibria.
Multi-Domain Balanced Sampling Improves Out-of-Distribution Generalization of Chest X-ray Pathology Prediction Models
Enoch Amoatey Tetteh
Joseph D Viviano
Joseph Paul Cohen
Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge. There ha… (see more)ve been several proposals for efficient and robust visual representation learning among vision research practitioners, especially in the sensitive and critical biomedical domain. In this paper, we propose an idea for out-of-distribution generalization of chest X-ray pathologies that uses a simple balanced batch sampling technique. We observed that balanced sampling between the multiple training datasets improves the performance over baseline models trained without balancing.
Multi-Domain Balanced Sampling Improves Out-of-Generalization of Chest X-ray Pathology Prediction Models
Enoch Amoatey Tetteh
Joseph D Viviano
Joseph Paul Cohen
Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge. There ha… (see more)ve been several proposals for efficient and robust visual representation learning among vision research practitioners, especially in the sensitive and critical biomedical domain. In this paper, we propose an idea for out-of-distribution generalization of chest X-ray pathologies that uses a simple balanced batch sampling technique. We observed that balanced sampling between the multiple training datasets improves the performance over baseline models trained without balancing. Code for this work is available on Github. 1
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
Miles Brundage
Shahar Avin
Jasmine Wang
Haydn Belfield
Gretchen Krueger
Gillian K. Hadfield
Heidy Khlaaf
Jingying Yang
H. Toner
Ruth Catherine Fong
Pang Wei Koh
Sara Hooker
Jade Leung
Andrew John Trask
Emma Bluemke
Jonathan Lebensbold
Cullen C. O'keefe
Mark Koren
Th'eo Ryffel … (see 39 more)
JB Rubinovitz
Tamay Besiroglu
Federica Carugati
Jack Clark
Peter Eckersley
Sarah de Haas
Maritza L. Johnson
Ben Laurie
Alex Ingerman
Igor Krawczuk
Amanda Askell
Rosario Cammarota
A. Lohn
Charlotte Stix
Peter Mark Henderson
Logan Graham
Carina E. A. Prunkl
Bianca Martin
Elizabeth Seger
Noa Zilberman
Sean O hEigeartaigh
Frens Kroeger
Girish Sastry
R. Kagan
Adrian Weller
Brian Shek-kam Tse
Elizabeth Barnes
Allan Dafoe
Paul D. Scharre
Ariel Herbert-Voss
Martijn Rasser
Shagun Sodhani
Carrick Flynn
Thomas Krendl Gilbert
Lisa Dyer
Saif M. Khan
Markus Anderljung
Neural Autoregressive Flows
Chin-Wei Huang
Alexandre Lacoste
Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via M… (see more)asked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.
Bayesian Hypernetworks
Chin-Wei Huang
Riashat Islam
Ryan Turner
Alexandre Lacoste
We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, h, is a neura… (see more)l network which learns to transform a simple noise distribution, p(e) = N(0,I), to a distribution q(t) := q(h(e)) over the parameters t of another neural network (the ``primary network). We train q with variational inference, using an invertible h to enable efficient estimation of the variational lower bound on the posterior p(t | D) via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap iid sampling of q(t). In practice, Bayesian hypernets provide a better defense against adversarial examples than dropout, and also exhibit competitive performance on a suite of tasks which evaluate model uncertainty, including regularization, active learning, and anomaly detection.
Bayesian Hypernetworks
Chin-Wei Huang
Riashat Islam
Ryan Turner
Alexandre Lacoste
A Closer Look at Memorization in Deep Networks
Devansh Arpit
Stanisław Jastrzębski
Nicolas Ballas
Maxinder S. Kanwal
Asja Fischer
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While dee… (see more)p networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.