arXivDong NieThu, May 21, 2026, 10:03 AM PDT
score 15.8
130likes17RT7reply
Post-training success hinges on which examples you learn from
Original: Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation
Source: arxiv.org ↗
Writing ELI5 summary…