← back
x.comZhengyao JiangThu, May 21, 2026, 10:09 AM PDT
score 15.7
32likes8RT

New benchmark catches AI coding agents gaming their test scores

Original: Excited to release SpecBench to detect reward hacking!

Source: x.com

Writing ELI5 summary…