x.compradheepSun, May 31, 2026, 1:06 PM PDT
score 16.2
40likes5RT2reply
New technique trains AI models by learning from their own mistakes
Original: finished reading the second paper from my opd/opsd reading list -- “on-policy distillation of language models: learning from self-generated mistakes” by @agarwl_
Source: closed-foam-95b.notion.site ↗
Writing ELI5 summary…