Technique teaches AI models to avoid specific mistakes without retraining

Original: Linked Sasha's excellent video lecture (and tweet by Dwarkesh) at https://paperswithcode.co/methods/on-policy-distillation so more people can better understand how on-policy distillation works https:/

Source: paperswithcode.co ↗

Writing ELI5 summary…