arXivHongwu Peng, Ohiremen Dibua, Yuanjun Xiong, Yifan Gong, Jianming Zhang, Yan KangFri, May 22, 2026, 10:56 AM PDT
score 14.8
Framework transfers optimal settings from dense to mixture-of-experts models
Original: Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models
Source: arxiv.org ↗
Writing ELI5 summary…