← back
x.comRulin ShaoMon, May 25, 2026, 12:49 PM PDT
score 15.8
195likes28RT3reply

Training system learns to improve its own reward metrics

Original: DR Tulu is now accepted for an oral presentation at #ICML2026 🙏

Source: arxiv.org

Writing ELI5 summary…