On-policy distillation becomes mainstream technique for training AI models with reinforcement learning

Original: On-Policy Distillation is the most active new research direction being explored in RL for LLMs. Had the chance to discuss how it works with Dwarkesh and why it fits so nicely into large-scale pipelin

Source: x.com ↗

Writing ELI5 summary…