← back
arXivRunxi Cheng, Yuchen Guan, Yongxian Wei, Qianpu Sun, Qixiu Li, Sinan Du, Feng Xiong, Chun Yuan, Yan Lu, Yeyun GongWed, May 20, 2026, 2:35 AM PDT
score 16.2

Reusing model memories to scale language models cheaper

Original: Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory

Source: arxiv.org

Writing ELI5 summary…