Portrait of Charles-Etienne Joseph is unavailable

Charles-Etienne Joseph

Alumni

Publications

$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can strug… (see more)gle to optimize unseen tasks (*meta-generalize*), especially when training networks wider than those seen during meta-training. To address this, we derive the Maximal Update Parametrization (
Continual Pre-training of MoEs: How robust is your router?
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
Sambit Sahu
Can We Learn Communication-Efficient Optimizers?
μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
Learning Optimizers for Local SGD