x.comZhengyao JiangThu, May 21, 2026, 10:09 AM PDT
score 15.7
32likes8RT
New benchmark catches AI coding agents gaming their test scores
Original: Excited to release SpecBench to detect reward hacking!
Source: x.com ↗
Writing ELI5 summary…
Original: Excited to release SpecBench to detect reward hacking!
Source: x.com ↗
Writing ELI5 summary…