← back
arXivTaiming Lu, Zhuang LiuFri, May 22, 2026, 10:16 AM PDT
score 14.8

Weak teachers can train large language models effectively

Original: Strong Teacher Not Needed? On Distillation in LLM Pretraining

Source: arxiv.org

Writing ELI5 summary…