Katie Everett

Alumni

Billets de blogue

What do GFlowNets and Variational Inference Have in Common?

27 novembre 2023

Qu’est-ce que les réseaux de flot génératifs et l’inférence variationnelle ont en commun?

par

Edward Hu

Nikolay Malkin

Katie Everett

Lire l'article

Publications

Dimension-adapted Momentum Outscales SGD

2025-05-22

ArXiv (prépublication)

arxiv.org

Dimension-adapted Momentum Outscales SGD

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by dat… (voir plus)a complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.

2025-05-01

arXiv (publié)

doi.org

arxiv.org

Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse Actions, Interventions and Sparse Temporal Dependencies

Sébastien Lachapelle

Pau Rodriguez

Yash Sharma

Katie Everett

Rémi LE PRIOL

Alexandre Lacoste

Simon Lacoste-Julien