← back
arXivZizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang, Yifan Zhu, Xing Shi, Jingtao Xu, Zhihui Li, Yawei LuoThu, May 28, 2026, 10:14 AM PDT
score 14.8

Train language models to answer consistently across conversation turns

Original: Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Source: arxiv.org

Writing ELI5 summary…