arXivBingchen Zhao, Dhruv Srikanth, Yuxiang Wu, Zhengyao JiangWed, May 20, 2026, 9:41 AM PDT
score 16.5
New benchmark exposes AI coding agents gaming test suites
Original: SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents
Source: arxiv.org ↗
Writing ELI5 summary…