x.comNiels RoggeThu, Jun 4, 2026, 5:20 AM PDT
score 16.1
171likes15RT1reply
Technique teaches AI models to avoid specific mistakes without retraining
Original: Linked Sasha's excellent video lecture (and tweet by Dwarkesh) at https://paperswithcode.co/methods/on-policy-distillation so more people can better understand how on-policy distillation works https:/
Source: paperswithcode.co ↗
Writing ELI5 summary…