← back
arXivZhi Chen, Zhensu Sun, Yuling Shi, David Lo, Lingxiao JiangWed, Jul 1, 2026, 10:50 AM PDT
score 17.1

Benchmarks for coding agents may mislead progress measures

Original: Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

Source: arxiv.org

Writing ELI5 summary…