arXivZizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang, Yifan Zhu, Xing Shi, Jingtao Xu, Zhihui Li, Yawei LuoThu, May 28, 2026, 10:14 AM PDT
score 14.8
Train language models to answer consistently across conversation turns
Original: Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models
Source: arxiv.org ↗
Writing ELI5 summary…