arXivJianghao Wu, Jianfei Cai, Weiqiang Wang, Jin Ye, Daniel F. Schmidt, Yasmeen GeorgeWed, May 27, 2026, 8:38 AM PDT
score 16.4
Picking training data for AI reasoning without labels or rewards
Original: Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
Source: arxiv.org ↗
Writing ELI5 summary…