← back
arXivXixiang He, Qiyao Sun, Ao Cheng, Xingming Li, Xuanyu Ji, Hailun Lu, Runke Huang, Qingyong HuWed, May 20, 2026, 5:57 AM PDT
score 17.1

Fixing reinforcement learning collapse in math reasoning models

Original: Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation

Source: arxiv.org

Writing ELI5 summary…