Portrait de Guangyuan Wang n'est pas disponible

Guangyuan Wang

Stagiaire de recherche - McGill
Superviseur⋅e principal⋅e
Sujets de recherche
Apprentissage par renforcement
Apprentissage profond
Théorie de l'apprentissage automatique

Publications

Phases of Muon: When Muon Eclipses SignSGD
Recently, Muon and related spectral optimizers have demonstrated strong empirical performance as scalable stochastic methods, often outperfo… (voir plus)rming Adam. Yet their behaviour remains poorly understood. We analyze stochastic spectral optimizers, including Muon, on a high-dimensional matrix-valued least squares problem. We derive explicit deterministic dynamics that provide a tractable framework for studying learning behaviour with a focus on (stochastic) SignSVD, which Muon approximates, and (stochastic) SignSGD, the latter serving as a proxy for Adam. Our analysis shows that for large batch size, SignSVD performs a square-root preconditioning with respect to the data covariance spectrum, while for small batch size smaller eigenmodes behave like SGD, slowing down convergence. We contrast with SignSGD which for generic covariance performs no preconditioning and has no transition, leading to different optimal learning rates and convergence characteristics. The two methods match up to a constant factor with isotropic data, but behave differently with anisotropic data. An analysis of a power law covariance model with data exponent
Langevin Soft Actor-Critic: Efficient Exploration Through Uncertainty-Driven Critic Learning
Existing actor-critic algorithms, which are popular for continuous control reinforcement learning (RL) tasks, suffer from poor sample effici… (voir plus)ency due to lack of principled exploration mechanism within them. Motivated by the success of Thompson sampling for efficient exploration in RL, we propose a novel model-free RL algorithm, Langevin Soft Actor Critic (LSAC), which prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC employs three key innovations: approximate Thompson sampling through distributional Langevin Monte Carlo (LMC) based