arXivYufei Ma, Zihan Liang, Ben Chen, Zhipeng Qian, Huangyu Dai, Lingtao Mao, Xuxin Zhang, Chenyi Lei, Wenwu OuMon, May 18, 2026, 5:18 AM PDT
score 16.3
Self-teaching AI to pick better web searches during reasoning
Original: SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning
Source: arxiv.org ↗
Writing ELI5 summary…