arXivJohnny Tian-Zheng Wei, Jerry Li, Ameya Godbole, Robin JiaSat, May 23, 2026, 7:06 PM PDT
score 15.9
Fixing inflated test scores when training data leaks into evaluations
Original: Spiking the training data to correct for test set contamination
Source: arxiv.org ↗
Writing ELI5 summary…