Guangyuan Wang

Research Intern - McGill University

Supervisor

Doina Precup

Research Topics

Deep Learning

Machine Learning Theory

Reinforcement Learning

Publications

Phases of Muon: When Muon Eclipses SignSGD

Lucas Benigni

Atish Agarwala

Recently, Muon and related spectral optimizers have demonstrated strong empirical performance as scalable stochastic methods, often outperfo… (see more)rming Adam. Yet their behaviour remains poorly understood. We analyze stochastic spectral optimizers, including Muon, on a high-dimensional matrix-valued least squares problem. We derive explicit deterministic dynamics that provide a tractable framework for studying learning behaviour with a focus on (stochastic) SignSVD, which Muon approximates, and (stochastic) SignSGD, the latter serving as a proxy for Adam. Our analysis shows that for large batch size, SignSVD performs a square-root preconditioning with respect to the data covariance spectrum, while for small batch size smaller eigenmodes behave like SGD, slowing down convergence. We contrast with SignSGD which for generic covariance performs no preconditioning and has no transition, leading to different optimal learning rates and convergence characteristics. The two methods match up to a constant factor with isotropic data, but behave differently with anisotropic data. An analysis of a power law covariance model with data exponent

2026-05-09

arXiv (preprint)

doi.org

arxiv.org

Langevin Soft Actor-Critic: Efficient Exploration Through Uncertainty-Driven Critic Learning

Haque Ishfaq

Guangyuan Wang

Sami Nur Islam

Doina Precup

Existing actor-critic algorithms, which are popular for continuous control reinforcement learning (RL) tasks, suffer from poor sample effici… (see more)ency due to lack of principled exploration mechanism within them. Motivated by the success of Thompson sampling for efficient exploration in RL, we propose a novel model-free RL algorithm, Langevin Soft Actor Critic (LSAC), which prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC employs three key innovations: approximate Thompson sampling through distributional Langevin Monte Carlo (LMC) based

2025-04-22

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Guangyuan Wang

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Guangyuan Wang

Publications