Verna Dankers

Postdoctorat - McGill

Superviseur⋅e principal⋅e

Siva Reddy

Sujets de recherche

Évaluation linguistique des modèles de langage

Interprétabilité mécanistique

Mémorisation

Traitement du langage naturel

Publications

Leveraging Routing Dynamics in Mixture-of-Experts Models for Efficient Language Adaptation

Mixture-of-Experts (MoE) models are widely used to scale language models, yet their expert routing behavior and adaptation in a multilingual… (voir plus) setting remain underexplored. In this work, we study multilingual routing dynamics during continual pre-training of an English-centric MoE model on a multilingual corpus, analyzing how expert usage varies across languages. We find that continual multilingual pre-training leads to diffused, language-agnostic routing in early and middle layers, with language specialization primarily emerging in the final layers. We also show that token-level vocabulary overlap between languages plays an important role in how languages are routed. Motivated by these findings, we propose a parameter-efficient adaptation strategy that updates language-specific and shared experts in the final MoE layers. Experiments on MultiBLiMP and Belebele show that our method achieves a strong performance-efficiency trade-off, attaining competitive performance relative to fine-tuning complete final layers, while updating less than 2% of the parameters. Overall, our findings provide insights into where and how language specialization emerges in MoEs during continual pre-training and provide practical insights for low-resource multilingual adaptation. Our code is available at https://github.com/aditi184/moe-routing-adaptation.

2026-05-27

arXiv (prépublication)

doi.org

arxiv.org

Towards Democratizing LLMs: Investigating Multilingual Mixture-of-Experts Models

2025-09-21

NeurIPS.cc/2025/Workshop/WiML (publié)