x.comYoonho LeeMon, Jun 29, 2026, 5:01 PM PDT
score 15.6
12RT
Set RL: AI learns by rewarding groups of attempts, not individual ones
Original: RT @teortaxesTex: > The key idea is set RL, where parallel traces get reward for being useful together, even if none of them solves the pro…
Source: x.com ↗
Writing ELI5 summary…