arXivZichun Yu, Chenyan XiongSun, May 17, 2026, 9:44 PM PDT
score 16.8
Generating synthetic variations of real text to stretch training data
Original: Generating Pretraining Tokens from Organic Data for Data-Bound Scaling
Source: arxiv.org ↗
Writing ELI5 summary…