x.comDavid CampbellSat, May 30, 2026, 10:53 AM PDT
score 24.3
1RT
Researcher critiques flawed benchmarks for testing AI hacking capability
Original: that's not a benchmark. THAT's a benchmark!
Source: x.com ↗
Writing ELI5 summary…
Original: that's not a benchmark. THAT's a benchmark!
Source: x.com ↗
Writing ELI5 summary…