Courtney Paquette

Associate Academic Member

Canada CIFAR AI Chair

Assistant Professor, McGill University, Department of Mathematics and Statistics

Research Scientist, Google Brain

Research Topics

Optimization

Biography

Courtney Paquette is an assistant professor at McGill University and a Canada CIFAR AI Chair at Mila – Quebec Artificial Intelligence Institute.

Her research focuses on designing and analyzing algorithms for large-scale optimization problems, motivated by applications in data science.

She received her PhD in mathematics from the University of Washington (2017), held postdoctoral positions at Lehigh University (2017–2018) and the University of Waterloo (NSF postdoctoral fellowship, 2018–2019), and was a research scientist at Google Brain in Montréal (2019–2020).

Current Students

May Bullard

Master's Research - McGill University

Matt Chaubet

Master's Research - McGill University

Damien Ferbach

PhD - Université de Montréal

Principal supervisor :

Gauthier Gidel

Website

Google Scholar

Begoña García Malaxechebarría

Research Intern - McGill University

Noah Marshall

PhD - McGill University

Principal supervisor :

Adam M. Oberman

Deborah Sampaio Antunes de Oliveira

Postdoctorate - McGill University

Website

Github

Google Scholar

Kevin Xiao

PhD - McGill University

Publications

Two-point deterministic equivalence for SGD in random feature models

Alexander Atanasov

Blake Bordelon

Jacob A Zavatone-Veth

Courtney Paquette

Cengiz Pehlevan

2025-06-09

ICML.cc/2025/Workshop/HiLD (poster)

openreview.net

Dimension-adapted Momentum Outscales SGD

2025-05-22

ArXiv (preprint)

arxiv.org

Dimension-adapted Momentum Outscales SGD

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by dat… (see more)a complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.

2025-05-01

arXiv (published)

doi.org

arxiv.org

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

Pierre Marion

Anna Korba

Peter Bartlett

Mathieu Blondel

Valentin De Bortoli

Arnaud Doucet

Felipe Llinares-López