New technique trains AI models by learning from their own mistakes

Original: finished reading the second paper from my opd/opsd reading list -- “on-policy distillation of language models: learning from self-generated mistakes” by @agarwl_

Source: closed-foam-95b.notion.site ↗

Writing ELI5 summary…