arXivZhanming Shen, Jintao Tong, Shaotian Yan, Chen Shen, Hao Chen, Wentao Ye, Xiaomeng Hu, Rui Miao, Haobo Wang, Junbo Zhao, Gang Chen, Jieping YeThu, Jul 2, 2026, 7:33 AM PDT
score 17.0
Fix for self-distillation method that harms long-reasoning AI models
Original: Purified OPSD: On-Policy Self-Distillation Without Losing How to Think
Source: arxiv.org ↗
Writing ELI5 summary…