x.comDavid CampbellSat, May 30, 2026, 10:53 AM PDT

score 24.3

1RT

Researcher critiques flawed benchmarks for testing AI hacking capability

Original: that's not a benchmark. THAT's a benchmark!

Source: x.com ↗

Writing ELI5 summary…