x.comScott WuMon, Jun 8, 2026, 12:54 PM PDT
score 18.9
133likes15RT10reply
New coding benchmark grades AI like human reviewers, not just tests
Original: SWE-Bench style grading has been the standard for years now - you ask the agent to solve an issue and then run its code on a pre-constructed unit test.
Source: x.com ↗
Writing ELI5 summary…