arXivPolina Gordienko, Georg Schollmeyer, Frauke Kreuter, Christoph JansenFri, May 22, 2026, 6:40 AM PDT
score 14.6
Researchers measure how easy it is to game AI benchmarks
Original: How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness
Source: arxiv.org ↗
Writing ELI5 summary…