Portrait de Courtney Paquette

Courtney Paquette

Membre académique associé
Chaire en IA Canada-CIFAR
Professeure adjointe, McGill University, Département de mathématiques et statistiques
Chercheuse scientifique, Google Brain
Sujets de recherche
Optimisation

Biographie

Courtney Paquette est professeure adjointe à l'Université McGill et titulaire d'une chaire en IA Canada-CIFAR à Mila – Institut québécois d’intelligence artificielle. Sa recherche se concentre sur la conception et l'analyse d'algorithmes pour les problèmes d'optimisation à grande échelle, et vise des applications en science des données. Courtney Paquette a obtenu un doctorat en mathématiques de l'Université de Washington (2017), a occupé des postes postdoctoraux à l'Université Lehigh (2017-2018) et à l'Université de Waterloo (bourse postdoctorale de la NSF, 2018-2019), et a été chercheuse scientifique chez Google Research, Brain Montréal (2019-2020).

Étudiants actuels

Maîtrise recherche - McGill
Postdoctorat - McGill
Maîtrise recherche - McGill
Maîtrise recherche - McGill
Doctorat - McGill
Maîtrise recherche - McGill
Doctorat - McGill

Publications

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Elliot Paquette
Ben Adlam
Jeffrey Pennington
Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of… (voir plus) problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quadratic problems, worst-case analyses give an asymptotic convergence rate for SGD that is no better than full-batch gradient descent (GD), and the purported implicit regularization effects of SGD lack a precise explanation. In this work, we study the dynamics of multi-pass SGD on high-dimensional convex quadratics and establish an asymptotic equivalence to a stochastic differential equation, which we call homogenized stochastic gradient descent (HSGD), whose solutions we characterize explicitly in terms of a Volterra integral equation. These results yield precise formulas for the learning and risk trajectories, which reveal a mechanism of implicit conditioning that explains the efficiency of SGD relative to GD. We also prove that the noise from SGD negatively impacts generalization performance, ruling out the possibility of any type of implicit regularization in this context. Finally, we show how to adapt the HSGD formalism to include streaming SGD, which allows us to produce an exact prediction for the excess risk of multi-pass SGD relative to that of streaming SGD (bootstrap risk).
Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions
Kiwon Lee
Andrew Nicholas Cheng
Elliot Paquette
Halting Time is Predictable for Large Models: A Universality Property and Average-Case Analysis
Bart van Merriënboer
Fabian Pedregosa