← back
arXivDong NieThu, May 21, 2026, 10:03 AM PDT
score 15.8
130likes17RT7reply

Post-training success hinges on which examples you learn from

Original: Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation

Source: arxiv.org

Writing ELI5 summary…