Portrait de David Scott Krueger

David Scott Krueger

Membre académique principal
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle (DIRO)
Sujets de recherche
Apprentissage de représentations
Apprentissage profond

Biographie

David Krueger est professeur adjoint en IA robuste, raisonnable et responsable au département d'informatique et de recherche opérationnelle (DIRO) et un membre académique principal à Mila - Institut québécois d'intelligence artificielle, au Center for Human-Compatible AI (CHAI) de l'université de Berkeley et au Center for the Study of Existential Risk (CSER). Ses travaux portent sur la réduction du risque d'extinction de l'humanité par l'intelligence artificielle (x-risque IA) par le biais de la recherche technique ainsi que de l'éducation, de la sensibilisation, de la gouvernance et de la défense des droits humains.

Ses recherches couvrent de nombreux domaines de l'apprentissage profond, de l'alignement de l'IA, de la sécurité de l'IA et de l'éthique de l'IA, notamment les modes de défaillance de l'alignement, la manipulation algorithmique, l'interprétabilité, la robustesse et la compréhension de la manière dont les systèmes d'IA apprennent et se généralisent. Il a été présenté dans les médias, notamment dans l'émission Good Morning Britain d'ITV, Inside Story d'Al Jazeera, France 24, New Scientist et l'Associated Press.

David a terminé ses études supérieures à l'Université de Montréal et à Mila - Institut québécois d'intelligence artificielle, où il a travaillé avec Yoshua Bengio, Roland Memisevic et Aaron Courville.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche

Publications

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
Miles Brundage
Shahar Avin
Jasmine Wang
Haydn Belfield
Gretchen Krueger
Gillian K. Hadfield
Heidy Khlaaf
Jingying Yang
H. Toner
Ruth Catherine Fong
Pang Wei Koh
Sara Hooker
Jade Leung
Andrew John Trask
Emma Bluemke
Jonathan Lebensbold
Cullen C. O'keefe
Mark Koren
Th'eo Ryffel … (voir 39 de plus)
JB Rubinovitz
Tamay Besiroglu
Federica Carugati
Jack Clark
Peter Eckersley
Sarah de Haas
Maritza L. Johnson
Ben Laurie
Alex Ingerman
Igor Krawczuk
Amanda Askell
Rosario Cammarota
A. Lohn
Charlotte Stix
Peter Mark Henderson
Logan Graham
Carina E. A. Prunkl
Bianca Martin
Elizabeth Seger
Noa Zilberman
Sean O hEigeartaigh
Frens Kroeger
Girish Sastry
R. Kagan
Adrian Weller
Brian Shek-kam Tse
Elizabeth Barnes
Allan Dafoe
Paul D. Scharre
Ariel Herbert-Voss
Martijn Rasser
Shagun Sodhani
Carrick Flynn
Thomas Krendl Gilbert
Lisa Dyer
Saif M. Khan
Markus Anderljung
Neural Autoregressive Flows
Chin-Wei Huang
Alexandre Lacoste
Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via M… (voir plus)asked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.
Bayesian Hypernetworks
Chin-Wei Huang
Riashat Islam
Ryan Turner
Alexandre Lacoste
Bayesian Hypernetworks
Chin-Wei Huang
Riashat Islam
Ryan Turner
Alexandre Lacoste
We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, h, is a neura… (voir plus)l network which learns to transform a simple noise distribution, p(e) = N(0,I), to a distribution q(t) := q(h(e)) over the parameters t of another neural network (the ``primary network). We train q with variational inference, using an invertible h to enable efficient estimation of the variational lower bound on the posterior p(t | D) via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap iid sampling of q(t). In practice, Bayesian hypernets provide a better defense against adversarial examples than dropout, and also exhibit competitive performance on a suite of tasks which evaluate model uncertainty, including regularization, active learning, and anomaly detection.
A Closer Look at Memorization in Deep Networks
Devansh Arpit
Stanisław Jastrzębski
Nicolas Ballas
Maxinder S. Kanwal
Asja Fischer
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While dee… (voir plus)p networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
Deep Nets Don't Learn via Memorization
Nicolas Ballas
Stanisław Jastrzębski
Devansh Arpit
Maxinder S. Kanwal
Asja Fischer
We use empirical methods to argue that deep neural networks (DNNs) do not achieve their performance by memorizing training data in spite of … (voir plus)overlyexpressive model architectures. Instead, they learn a simple available hypothesis that fits the finite data samples. In support of this view, we establish that there are qualitative differences when learning noise vs. natural datasets, showing: (1) more capacity is needed to fit noise, (2) time to convergence is longer for random labels, but shorter for random inputs, and (3) that DNNs trained on real data examples learn simpler functions than when trained with noise data, as measured by the sharpness of the loss function at convergence. Finally, we demonstrate that for appropriately tuned explicit regularization, e.g. dropout, we can degrade DNN training performance on noise datasets without compromising generalization on real data.
Facilitating Multimodality in Normalizing Flows
The true Bayesian posterior of a model such as a neural network may be highly multimodal. In principle, normalizing flows can represent such… (voir plus) a distribution via compositions of invertible transformations of random noise. In practice, however, existing normalizing flows may fail to capture most of the modes of a distribution. We argue that the conditionally affine structure of the transformations used in [Dinh et al., 2014, 2016, Kingma et al., 2016] is inefficient, and show that flows which instead use (conditional) invertible non-linear transformations naturally enable multimodality in their output distributions. With just two layers of our proposed deep sigmoidal flow, we are able to model complicated 2d energy functions with much higher fidelity than six layers of deep affine flows.