← back
arXivPolina Gordienko, Georg Schollmeyer, Frauke Kreuter, Christoph JansenFri, May 22, 2026, 6:40 AM PDT
score 14.6

Researchers measure how easy it is to game AI benchmarks

Original: How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness

Source: arxiv.org

Writing ELI5 summary…