← back
x.comDavid CampbellSat, May 30, 2026, 10:53 AM PDT
score 24.3
1RT

Researcher critiques flawed benchmarks for testing AI hacking capability

Original: that's not a benchmark. THAT's a benchmark!

Source: x.com

Writing ELI5 summary…