← back
arXivYunhe Li, Hao Shi, Wenhao Liu, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Shuang Qiu, Linqi SongThu, Jul 2, 2026, 10:58 AM PDT
score 17.1

New method improves LLM reasoning by blending teacher and student knowledge

Original: DemoPSD: Disagreement-Modulated Policy Self-Distillation

Source: arxiv.org

Writing ELI5 summary…