Portrait de Gintare Karolina Dziugaite

Gintare Karolina Dziugaite

Membre industriel associé
Professeure associée, McGill University, École d'informatique
Chercheuse scientifique senior, Google DeepMind
Sujets de recherche
Apprentissage profond
Théorie de l'apprentissage automatique
Théorie de l'information

Biographie

Gintare Karolina Dziugaite est chercheuse scientifique senior chez Google DeepMind, à Toronto, et professeure associée à l'École d'informatique de l'Université McGill. Avant de se joindre à Google, elle a dirigé le programme Trustworthy AI chez Element AI / ServiceNow. Ses recherches combinent des approches théoriques et empiriques visant à comprendre l'apprentissage profond.

Gintare Karolina Dziugaite est bien connue pour ses travaux sur la rareté des réseaux et des données, le développement d'algorithmes et la découverte des effets sur la généralisation et d'autres mesures. Elle a été la première à étudier la connectivité des modes linéaires, en les reliant d'abord à l'existence des billets de loterie, puis aux paysages de pertes et au mécanisme d'élagage itératif de la magnitude. Ses recherches portent également sur la compréhension de la généralisation dans l'apprentissage profond et, plus généralement, sur le développement de méthodes fondées sur la théorie de l'information pour l'étude de la généralisation. Ses travaux les plus récents s’intéressent à l'élimination de l'influence des données sur le modèle (désapprentissage).

Mme Dziugaite a obtenu un doctorat en apprentissage automatique de l'Université de Cambridge, sous la direction de Zoubin Ghahramani. Elle a étudié les mathématiques à l'Université de Warwick et a suivi la partie III des mathématiques à l'Université de Cambridge, où elle a obtenu un Master of Advanced Studies (M.A.St.) en mathématiques. Elle a participé à plusieurs programmes de longue durée à l'Institute for Advanced Study de l’Université Princeton (New Jersey) et au Simons Institute for the Theory of Computing de l'Université de Berkeley.

Publications

Unmasking Efficiency: Learning Salient Sparse Models in Non-IID Federated Learning
Riyasat Ohib
Bishal Thapaliya
Jingyu Liu 0001
Vince D. Calhoun
Sergey M. Plis
In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient commu… (voir plus)nication. SSFL identifies a sparse subnetwork prior to training, leveraging parameter saliency scores computed separately on local client data in non-IID scenarios, and then aggregated, to determine a global mask. Only the sparse model weights are communicated each round between the clients and the server. We validate SSFL's effectiveness using standard non-IID benchmarks, noting marked improvements in the sparsity--accuracy trade-offs. Finally, we deploy our method in a real-world federated learning framework and report improvement in communication time.
Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization
Idan Attias
MAHDI HAGHIFAM
Roi Livni
Daniel M. Roy
In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). … (voir plus)We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Samir Obando Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob Nicolaus Foerster
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (voir plus)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias
Yu Yang
Eric Gan
Baharan Mirzasoleiman
Neural networks trained with (stochastic) gradient descent have an inductive bias towards learning simpler solutions. This makes them highly… (voir plus) prone to learning spurious correlations in the training data, that may not hold at test time. In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations. Notably, we show that examples with spurious features are provably separable based on the model's output early in training. We further illustrate that if spurious features have a small enough noise-to-signal ratio, the network's output on the majority of examples is almost exclusively determined by the spurious features, leading to poor worst-group test accuracy. Finally, we propose SPARE, which identifies spurious correlations early in training and utilizes importance sampling to alleviate their effect. Empirically, we demonstrate that SPARE outperforms state-of-the-art methods by up to 21.1% in worst-group accuracy, while being up to 12x faster. We also show that SPARE is a highly effective but lightweight method to discover spurious correlations.
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Samir Obando Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob Nicolaus Foerster
Leveraging Function Space Aggregation for Federated Learning at Scale
Nikita Dhawan
Nicole Elyse Mitchell
Zachary Charles
Zachary Garrett
The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model,… (voir plus) without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.
The Cost of Scaling Down Large Language Models: Reducing Model Size Affects Memory before In-context Learning
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-Kelley
We study how down-scaling large language model (LLM) size impacts LLM capabilities. We begin by measuring the effects of weight pruning – … (voir plus)a popular technique for reducing model size – on the two abilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in context. Surprisingly, we find that existing pruning techniques affect these two abilities of LLMs differently. For example, pruning more than 30% of weights significantly decreases an LLM’s ability to recall facts presented during pre-training. Yet pruning 60-70% of weights largely preserves an LLM’s ability to process information in-context, ranging from retrieving answers based on information presented in context to learning parameterized functions such as a linear classifier based on a few examples. Moderate pruning impairs LLM’s ability to recall facts learnt from pre-training. However, its effect on model’s ability to process information presented in context is much less pronounced. The said disparate effects similarly arise when replacing the original model with a smaller dense one with reduced width and depth. This similarity suggests that model size reduction in general underpins the said disparity.
JaxPruner: A concise library for sparsity research
Joo Hyung Lee
Wonpyo Park
Nicole Elyse Mitchell
Jonathan Pilault
Johan Samir Obando Ceron
Han-Byul Kim
Namhoon Lee
Elias Frantar
Yun Long
Amir Yazdanbakhsh
Shivani Agrawal
Suvinay Subramanian
Xin Wang
Sheng-Chun Kao
Xingyao Zhang
Trevor Gale
Aart J.C. Bik
Woohyun Han
Milen Ferev
Zhonglin Han … (voir 5 de plus)
Hong-Seok Kim
Yann Dauphin
Utku Evci
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims … (voir plus)to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
Dataset Difficulty and the Role of Inductive Bias
Devin Kwok
Nikhil Anand
Jonathan Frankle
Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examp… (voir plus)les within a dataset. These methods, which we call"example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. To determine how example rankings vary due to these random and controlled effects, we systematically compare different formulations of scores over a range of runs and model architectures. We find that scores largely share the following traits: they are noisy over individual runs of a model, strongly correlated with a single notion of difficulty, and reveal examples that range from being highly sensitive to insensitive to the inductive biases of certain model architectures. Drawing from statistical genetics, we develop a simple method for fingerprinting model architectures using a few sensitive examples. These findings guide practitioners in maximizing the consistency of their scores (e.g. by choosing appropriate scoring methods, number of runs, and subsets of examples), and establishes comprehensive baselines for evaluating scores in the future.
Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Memorization, and Tracing
Idan Attias
MAHDI HAGHIFAM
Roi Livni
Daniel M. Roy
In this work, we investigate the interplay between memorization and learning in the context of stochastic convex optimization (SCO)… (voir plus). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the
Mixture of Experts in a Mixture of RL settings
Timon Willi
Johan Samir Obando Ceron
Jakob Nicolaus Foerster
Simultaneous linear connectivity of neural networks modulo permutation
Ekansh Sharma
Devin Kwok
Tom Denton
Daniel M. Roy