Portrait de Devin Kwok

Devin Kwok

Doctorat - McGill
Superviseur⋅e principal⋅e
Sujets de recherche
Apprentissage profond
Théorie de l'apprentissage automatique

Publications

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
Gül Sena Altıntaş
Colin Raffel
Neural network training is inherently sensitive to initialization and the randomness induced by stochastic gradient descent. However, it is … (voir plus)unclear to what extent such effects lead to meaningfully different networks, either in terms of the models' weights or the underlying functions that were learned. In this work, we show that during the initial "chaotic" phase of training, even extremely small perturbations reliably causes otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time. We quantify this divergence through (i)
The Butterfly Effect: Tiny Perturbations Cause Neural Network Training to Diverge
Gül Sena Altıntaş
Neural network training begins with a chaotic phase in which the network is sensitive to small perturbations, such as those caused by stocha… (voir plus)stic gradient descent (SGD). This sensitivity can cause identically initialized networks to diverge both in parameter space and functional similarity. However, the exact degree to which networks are sensitive to perturbation, and the sensitivity of networks as they transition out of the chaotic phase, is unclear. To address this uncertainty, we apply a controlled perturbation at a single point in training time and measure its effect on otherwise identical training trajectories. We find that both the
Dataset Difficulty and the Role of Inductive Bias
Nikhil Anand
Jonathan Frankle
Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examp… (voir plus)les within a dataset. These methods, which we call"example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. To determine how example rankings vary due to these random and controlled effects, we systematically compare different formulations of scores over a range of runs and model architectures. We find that scores largely share the following traits: they are noisy over individual runs of a model, strongly correlated with a single notion of difficulty, and reveal examples that range from being highly sensitive to insensitive to the inductive biases of certain model architectures. Drawing from statistical genetics, we develop a simple method for fingerprinting model architectures using a few sensitive examples. These findings guide practitioners in maximizing the consistency of their scores (e.g. by choosing appropriate scoring methods, number of runs, and subsets of examples), and establishes comprehensive baselines for evaluating scores in the future.
Dataset Difficulty and the Role of Inductive Bias
Nikhil Anand
Jonathan Frankle
Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examp… (voir plus)les within a dataset. These methods, which we call"example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. To determine how example rankings vary due to these random and controlled effects, we systematically compare different formulations of scores over a range of runs and model architectures. We find that scores largely share the following traits: they are noisy over individual runs of a model, strongly correlated with a single notion of difficulty, and reveal examples that range from being highly sensitive to insensitive to the inductive biases of certain model architectures. Drawing from statistical genetics, we develop a simple method for fingerprinting model architectures using a few sensitive examples. These findings guide practitioners in maximizing the consistency of their scores (e.g. by choosing appropriate scoring methods, number of runs, and subsets of examples), and establishes comprehensive baselines for evaluating scores in the future.
Simultaneous linear connectivity of neural networks modulo permutation
Ekansh Sharma
Tom Denton
Daniel M. Roy
Deep Networks as Paths on the Manifold of Neural Representations
Richard D Lange
Jordan Kyle Matelsky
Xinyue Wang
Konrad Paul Kording
Neural Networks as Paths through the Space of Representations
Richard D Lange
Jordan Kyle Matelsky
Xinyue Wang
Konrad Paul Kording