arXivPolina Gordienko, Georg Schollmeyer, Frauke Kreuter, Christoph JansenFri, May 22, 2026, 6:40 AM PDT

score 14.6

Researchers measure how easy it is to game AI benchmarks

Original: How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness

Writing ELI5 summary…