Courtney Paquette

Elliot Paquette

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by dat… (voir plus)a complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.

2025-05-22

ArXiv (prépublication)

Dimension-adapted Momentum Outscales SGD

Damien Ferbach

Katie Everett

Elliot Paquette

2025-05-22

ArXiv (prépublication)

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

Pierre Marion

Anna Korba

Peter Bartlett

Mathieu Blondel

Valentin De Bortoli

Arnaud Doucet

Felipe Llinares-López

Quentin Berthet

2025-01-22

aistats.org/AISTATS/2025/Conference (présentation orale)

High Dimensional First Order Mini-Batch Algorithms on Quadratic Problems

Andrew Nicholas Cheng

Kiwon Lee

We analyze the dynamics of general mini-batch first order algorithms on the …

2024-10-10

NeurIPS.cc/2024/Workshop/OPT (publié)

4+3 Phases of Compute-Optimal Neural Scaling Laws

Elliot Paquette

Lechao Xiao

Jeffrey Pennington

2024-09-25

NeurIPS.cc/2024/Conference (spotlight)

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Elizabeth Collins-Woodfin

Inbar Seroussi

Begoña García Malaxechebarría

Andrew Mackenzie

Elliot Paquette

2024-09-25

NeurIPS.cc/2024/Conference (poster)

Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems extended abstract

Tomas Gonzalez

Cristobal Guzman

2024-06-30

Proceedings of Thirty Seventh Conference on Learning Theory (publié)

proceedings.mlr.press

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

Pierre Marion

Anna Korba

Peter Bartlett

Mathieu Blondel

Valentin De Bortoli

Arnaud Doucet

Felipe Llinares-L'opez

Quentin Berthet

2024-02-08

ArXiv (prépublication)

Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems

Tom'as Gonz'alez

Crist'obal Guzm'an

2024-01-01

COLT (publié)

Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

Elizabeth Collins-Woodfin

Elliot Paquette

Inbar Seroussi

2023-08-17

ArXiv (prépublication)

Only tails matter: Average-Case Universality and Robustness in the Convex Regime

Leonardo Cunha

Fabian Pedregosa

Damien Scieur

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (publié)

Only Tails Matter: Average-Case Universality and Robustness in the Convex Regime

Leonardo Cunha

Fabian Pedregosa

Damien Scieur

2022-06-20

ArXiv (prépublication)