arXivYunhe Li, Hao Shi, Wenhao Liu, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Shuang Qiu, Linqi SongThu, Jul 2, 2026, 10:58 AM PDT
score 17.1
New method improves LLM reasoning by blending teacher and student knowledge
Original: DemoPSD: Disagreement-Modulated Policy Self-Distillation
Source: arxiv.org ↗
Writing ELI5 summary…