Portrait of David Scott Krueger

David Scott Krueger

Core Academic Member
Assistant professor, Université de Montréal, Department of Computer Science and Operations Research (DIRO)
Research Topics
Deep Learning
Representation Learning

Biography

David Krueger is an Assistant Professor in Robust, Reasoning and Responsible AI in the Department of Computer Science and Operations Research (DIRO) at University of Montreal, and a Core Academic Member at Mila - Quebec Artificial Intelligence Institute, UC Berkeley's Center for Human-Compatible AI (CHAI), and the Center for the Study of Existential Risk (CSER). His work focuses on reducing the risk of human extinction from artificial intelligence (AI x-risk) through technical research as well as education, outreach, governance and advocacy.

His research spans many areas of Deep Learning, AI Alignment, AI Safety and AI Ethics, including alignment failure modes, algorithmic manipulation, interpretability, robustness, and understanding how AI systems learn and generalize. He has been featured in media outlets including ITV's Good Morning Britain, Al Jazeera's Inside Story, France 24, New Scientist and the Associated Press.

David completed his graduate studies at the University of Montreal and Mila - Quebec Artificial Intelligence Institute, working with Yoshua Bengio, Roland Memisevic, and Aaron Courville.

Current Students

PhD - Université de Montréal
Principal supervisor :

Publications

The Flag and the Cross: White Christian Nationalism and the Threat to American Democracy by Philip S. Gorski and Samuel L. Perry (review)
Investigating the Nature of 3D Generalization in Deep Neural Networks
Shoaib Ahmed Siddiqui
Thomas M. Breuel
Facing AI extinction
Out-of-context Meta-learning in Large Language Models
Dmitrii Krasheninnikov
Egor Krasheninnikov
Brown et al. (2020) famously introduced the phenomenon of in-context meta-learning in large language models (LLMs). Our work establishes the… (see more) existence of a phenomenon we call out-of-context meta-learning via carefully designed synthetic experiments with large language models. We argue that out-of-context meta-learning is an important and surprising capability of LLMs, which may lead them to more readily "internalize" the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and apply it in appropriate contexts. We also raise the question of how this phenomenon emerges, and discuss two possible explanations: one relying on the way LLMs store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based methods may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks.
Blockwise self-supervised learning with Barlow Twins
Shoaib Ahmed Siddiqui
Yann LeCun
Stephane Deny
Current state-of-the-art deep networks are all powered by backpropagation. In this paper, we explore alternatives to full backpropagation in… (see more) the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. Notably, we show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins loss function at each block performs almost as well as end-to-end backpropagation on ImageNet: a linear probe trained on top of our blockwise pretrained model obtains a top-1 classification accuracy of 70.48\%, only 1.1\% below the accuracy of an end-to-end pretrained network (71.57\% accuracy). We perform extensive experiments to understand the impact of different components within our method and explore a variety of adaptations of self-supervised learning to the blockwise paradigm, building an exhaustive understanding of the critical avenues for scaling local learning rules to large networks, with implications ranging from hardware design to neuroscience.
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models&extra… (see more)polates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures&for each of various tasks within a large&diverse set of upstream&downstream tasks, in zero-shot, prompted,&finetuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, AI capabilities, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, OOD detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, fairness, molecules, computer programming/coding, math word problems,"emergent phase transitions", arithmetic, supervised learning, unsupervised/self-supervised learning,&reinforcement learning (single agent&multi-agent). When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models&extrapolates scaling behavior that other functional forms are incapable of expressing such as the nonmonotonic transitions present in the scaling behavior of phenomena such as double descent&the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
Shoaib Ahmed Siddiqui
Nitarshan Rajkumar
Sara Hooker
Modern machine learning research relies on relatively few carefully curated datasets. Even in these datasets, and typically in `untidy' or r… (see more)aw data, practitioners are faced with significant issues of data quality and diversity which can be prohibitively labor intensive to address. Existing methods for dealing with these challenges tend to make strong assumptions about the particular issues at play, and often require a priori knowledge or metadata such as domain labels. Our work is orthogonal to these methods: we instead focus on providing a unified and efficient framework for Metadata Archaeology -- uncovering and inferring metadata of examples in a dataset. We curate different subsets of data that might exist in a dataset (e.g. mislabeled, atypical, or out-of-distribution examples) using simple transformations, and leverage differences in learning dynamics between these probe suites to infer metadata of interest. Our method is on par with far more sophisticated mitigation methods across different tasks: identifying and correcting mislabeled examples, classifying minority-group samples, prioritizing points relevant for training and enabling scalable human auditing of relevant examples.
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J Bigelow
Robert P. Dick
Hidenori Tanaka
We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved … (see more)via training on a dataset are connected via simple paths of low loss. Specifically, we ask the following question: are minimizers that rely on different mechanisms for making their predictions connected via simple paths of low loss? We provide a definition of mechanistic similarity as shared invariances to input transformations and demonstrate that lack of linear connectivity between two models implies they use dissimilar mechanisms for making their predictions. Relevant to practice, this result helps us demonstrate that naive fine-tuning on a downstream dataset can fail to alter a model's mechanisms, e.g., fine-tuning can fail to eliminate a model's reliance on spurious attributes. Our analysis also motivates a method for targeted alteration of a model's mechanisms, named connectivity-based fine-tuning (CBFT), which we analyze using several synthetic datasets for the task of reducing a model's reliance on spurious attributes.
Noisy Pairing and Partial Supervision for Stylized Opinion Summarization
Reinald Kim
Mirella Lapata. 2020
Un-611
Emmanuel Bengio
Maxinder S. Kan-620
Asja Fischer
Somnath Basu
Roy Chowdhury
Chao Zhao
Tanya Goyal
Junyi Jiacheng Xu
Jessy Li
Ivor Wai-hung Tsang
James T. Kwok
Neil Houlsby
Andrei Giurgiu
Stanisław Jastrzębski … (see 22 more)
Bruna Morrone
Quentin de Laroussilhe
Mona Gesmundo
Attariyan Sylvain
Gelly
Thomas Wolf
Lysandre Debut
Julien Victor Sanh
Clement Chaumond
Anthony Delangue
Pier-339 Moi
Tim ric Cistac
R´emi Rault
Morgan Louf
Funtow-900 Joe
Sam Davison
Patrick Shleifer
Von Platen
Clara Ma
Yacine Jernite
Julien Plu
Canwen Xu
Opinion summarization research has primar-001 ily focused on generating summaries reflect-002 ing important opinions from customer reviews 0… (see more)03 without paying much attention to the writing 004 style. In this paper, we propose the stylized 005 opinion summarization task, which aims to 006 generate a summary of customer reviews in 007 the desired (e.g., professional) writing style. 008 To tackle the difficulty in collecting customer 009 and professional review pairs, we develop a 010 non-parallel training framework, Noisy Pair-011 ing and Partial Supervision ( NAPA ), which 012 trains a stylized opinion summarization sys-013 tem from non-parallel customer and profes-014 sional review sets. We create a benchmark P RO - 015 S UM by collecting customer and professional 016 reviews from Yelp and Michelin. Experimental 017 results on P RO S UM and FewSum demonstrate 018 that our non-parallel training framework con-019 sistently improves both automatic and human 020 evaluations, successfully building a stylized 021 opinion summarization model that can gener-022 ate professionally-written summaries from cus-023 tomer reviews. 024
On The Fragility of Learned Reward Functions
Lev E McKinney
Yawen Duan
Adam Gleave
Reward functions are notoriously difficult to specify, especially for tasks with complex goals. Reward learning approaches attempt to infer … (see more)reward functions from human feedback and preferences. Prior works on reward learning have mainly focused on the performance of policies trained alongside the reward function. This practice, however, may fail to detect learned rewards that are not capable of training new policies from scratch and thus do not capture the intended behavior. Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning. We demonstrate with experiments in tabular and continuous control environments that the severity of relearning failures can be sensitive to changes in reward model design and the trajectory dataset composition. Based on our findings, we emphasize the need for more retraining-based evaluations in the literature.
Training Equilibria in Reinforcement Learning
Lauro Langosco
Adam Gleave
In partially observable environments, reinforcement learning algorithms such as policy gradient and Q-learning may have multiple equilibria-… (see more)--policies that are stable under further training---and can converge to equilibria that are strictly suboptimal. Prior work blames insufficient exploration, but suboptimal equilibria can arise despite full exploration and other favorable circumstances like a flexible policy parametrization. We show theoretically that the core problem is that in partially observed environments, an agent's past actions induce a distribution on hidden states. Equipping the policy with memory helps it model the hidden state and leads to convergence to a higher reward equilibrium, \emph{even when there exists a memoryless optimal policy}. Experiments show that policies with insufficient memory tend to learn to use the environment as auxiliary memory, and parameter noise helps policies escape suboptimal equilibria.