x.comSasha RushWed, Jun 3, 2026, 7:24 PM PDT
score 17.1
939likes93RT15reply
On-policy distillation becomes mainstream technique for training AI models with reinforcement learning
Original: On-Policy Distillation is the most active new research direction being explored in RL for LLMs. Had the chance to discuss how it works with Dwarkesh and why it fits so nicely into large-scale pipelin
Source: x.com ↗
Writing ELI5 summary…