Goncalo Mordido

goncalo-filipe.torcato-mordido@mila.quebec

Postdoctorat - Polytechnique Montréal

Superviseur⋅e principal⋅e

Sarath Chandar Anbil Parthipan

Publications

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Pranshu Malviya

Goncalo Mordido

Aristide Baratin

Reza Babanezhad Harikandeh

Jerry Huang

Simon Lacoste-Julien

Razvan Pascanu

Sarath Chandar Anbil Parthipan

Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of su… (voir plus)ch optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.

2024-06-09

TMLR (accepté)

doi.org

openreview.net

Lookbehind-SAM: k steps back, 1 step forward

Goncalo Mordido

Pranshu Malviya

Aristide Baratin

Sarath Chandar Anbil Parthipan

2024-05-01

ICML.cc/2024/Conference (poster)

openreview.net