← back
arXivYufei Ma, Zihan Liang, Ben Chen, Zhipeng Qian, Huangyu Dai, Lingtao Mao, Xuxin Zhang, Chenyi Lei, Wenwu OuMon, May 18, 2026, 5:18 AM PDT
score 16.3

Self-teaching AI to pick better web searches during reasoning

Original: SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

Source: arxiv.org

Writing ELI5 summary…