Guangyuan Wang

Stagiaire de recherche - McGill

Superviseur⋅e principal⋅e

Doina Precup

Sujets de recherche

Apprentissage par renforcement

Apprentissage profond

Théorie de l'apprentissage automatique

Publications

Phases of Muon: When Muon Eclipses SignSGD

Lucas Benigni

Atish Agarwala

Recently, Muon and related spectral optimizers have demonstrated strong empirical performance as scalable stochastic methods, often outperfo… (voir plus)rming Adam. Yet their behaviour remains poorly understood. We analyze stochastic spectral optimizers, including Muon, on a high-dimensional matrix-valued least squares problem. We derive explicit deterministic dynamics that provide a tractable framework for studying learning behaviour with a focus on (stochastic) SignSVD, which Muon approximates, and (stochastic) SignSGD, the latter serving as a proxy for Adam. Our analysis shows that for large batch size, SignSVD performs a square-root preconditioning with respect to the data covariance spectrum, while for small batch size smaller eigenmodes behave like SGD, slowing down convergence. We contrast with SignSGD which for generic covariance performs no preconditioning and has no transition, leading to different optimal learning rates and convergence characteristics. The two methods match up to a constant factor with isotropic data, but behave differently with anisotropic data. An analysis of a power law covariance model with data exponent

2026-05-09

arXiv (prépublication)

doi.org

arxiv.org

Langevin Soft Actor-Critic: Efficient Exploration Through Uncertainty-Driven Critic Learning

Haque Ishfaq

Guangyuan Wang

Sami Nur Islam

Doina Precup

Existing actor-critic algorithms, which are popular for continuous control reinforcement learning (RL) tasks, suffer from poor sample effici… (voir plus)ency due to lack of principled exploration mechanism within them. Motivated by the success of Thompson sampling for efficient exploration in RL, we propose a novel model-free RL algorithm, Langevin Soft Actor Critic (LSAC), which prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC employs three key innovations: approximate Thompson sampling through distributional Langevin Monte Carlo (LMC) based

2025-04-22

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Guangyuan Wang

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Guangyuan Wang

Publications