Portrait of Gintare Karolina Dziugaite

Gintare Karolina Dziugaite

Associate Industry Member
Adjunct Professor, McGill University, School of Computer Science
Senior Research Scientist, Google DeepMind
Research Topics
Deep Learning
Information Theory
Machine Learning Theory

Biography

Gintare Karolina Dziugaite is a senior research scientist at Google DeepMind in Toronto, and an adjunct professor at the McGill University School of Computer Science. Prior to joining Google, she led the Trustworthy AI program at Element AI (ServiceNow). Her research combines theoretical and empirical approaches to understanding deep learning.

Dziugaite is well known for her work on network and data sparsity, developing algorithms and uncovering effects on generalization and other metrics. She pioneered the study of linear mode connectivity, first connecting it to the existence of lottery tickets, then to loss landscapes and the mechanism of iterative magnitude pruning. Another major focus of her research is understanding generalization in deep learning and, more generally, the development of information-theoretic methods for studying generalization. Her most recent work looks at removing the influence of data on the model (unlearning).

Dziugaite obtained her PhD in machine learning from the University of Cambridge under the supervision of Zoubin Ghahramani. Prior to that, she studied mathematics at the University of Warwick and read Part III in Mathematics at the University of Cambridge, receiving a Master of Advanced Study (MASt) in mathematics. She has participated in a number of long-term programs at the Institute for Advanced Study in Princeton, NJ, and at the Simons Institute for the Theory of Computing at the University of Berkeley.

Publications

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization
Idan Attias
MAHDI HAGHIFAM
Roi Livni
Daniel M. Roy
In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). … (see more)We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the
Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias
Yu Yang
Eric Gan
Baharan Mirzasoleiman
Neural networks trained with (stochastic) gradient descent have an inductive bias towards learning simpler solutions. This makes them highly… (see more) prone to learning spurious correlations in the training data, that may not hold at test time. In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations. Notably, we show that examples with spurious features are provably separable based on the model's output early in training. We further illustrate that if spurious features have a small enough noise-to-signal ratio, the network's output on the majority of examples is almost exclusively determined by the spurious features, leading to poor worst-group test accuracy. Finally, we propose SPARE, which identifies spurious correlations early in training and utilizes importance sampling to alleviate their effect. Empirically, we demonstrate that SPARE outperforms state-of-the-art methods by up to 21.1% in worst-group accuracy, while being up to 12x faster. We also show that SPARE is a highly effective but lightweight method to discover spurious correlations.
Leveraging Function Space Aggregation for Federated Learning at Scale
Nikita Dhawan
Nicole Elyse Mitchell
Zachary Charles
Zachary Garrett
The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model,… (see more) without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.
JaxPruner: A concise library for sparsity research
Joo Hyung Lee
Wonpyo Park
Nicole Elyse Mitchell
Han-Byul Kim
Namhoon Lee
Elias Frantar
Yun Long
Amir Yazdanbakhsh
Shivani Agrawal
Suvinay Subramanian
Sheng-Chun Kao
Xingyao Zhang
Trevor Gale
Aart J.C. Bik
Woohyun Han
Milen Ferev
Zhonglin Han … (see 5 more)
Hong-Seok Kim
Utku Evci
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims … (see more)to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
Dataset Difficulty and the Role of Inductive Bias
Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examp… (see more)les within a dataset. These methods, which we call"example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. To determine how example rankings vary due to these random and controlled effects, we systematically compare different formulations of scores over a range of runs and model architectures. We find that scores largely share the following traits: they are noisy over individual runs of a model, strongly correlated with a single notion of difficulty, and reveal examples that range from being highly sensitive to insensitive to the inductive biases of certain model architectures. Drawing from statistical genetics, we develop a simple method for fingerprinting model architectures using a few sensitive examples. These findings guide practitioners in maximizing the consistency of their scores (e.g. by choosing appropriate scoring methods, number of runs, and subsets of examples), and establishes comprehensive baselines for evaluating scores in the future.
Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Memorization, and Tracing
Idan Attias
MAHDI HAGHIFAM
Roi Livni
Daniel M. Roy
In this work, we investigate the interplay between memorization and learning in the context of stochastic convex optimization (SCO)… (see more). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the
Linear Weight Interpolation Leads to Transient Performance Gains
Mixture of Experts in a Mixture of RL settings
Simultaneous linear connectivity of neural networks modulo permutation
Ekansh Sharma
Tom Denton
Daniel M. Roy
The Cost of Scaling Down Large Language Models: Reducing Model Size Affects Memory before In-context Learning.
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-kelley
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-kelley
How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techn… (see more)iques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.
Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization
MAHDI HAGHIFAM
Borja Rodr'iguez-G'alvez
Ragnar Thobaben
Mikael Skoglund
Daniel M. Roy