arXivYaxin Luo, Jiacheng Cui, Xiaohan Zhao, Xinyi Shang, Jiacheng Liu, Xinyue Bi, Zhaoyi Li, Zhiqiang ShenThu, May 28, 2026, 10:59 AM PDT
score 14.8
Reverse-engineering what data trained a language model
Original: LLMSurgeon: Diagnosing Data Mixture of Large Language Models
Source: arxiv.org ↗
Writing ELI5 summary…