← back
arXivJianghao Wu, Jianfei Cai, Weiqiang Wang, Jin Ye, Daniel F. Schmidt, Yasmeen GeorgeWed, May 27, 2026, 8:38 AM PDT
score 16.4

Picking training data for AI reasoning without labels or rewards

Original: Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection

Source: arxiv.org

Writing ELI5 summary…