Fixing language model training by correcting wrong reasoning paths

Original: Trajectory-Refined Distillation

Writing ELI5 summary…