← back
x.comScott WuMon, Jun 8, 2026, 12:54 PM PDT
score 18.9
133likes15RT10reply

New coding benchmark grades AI like human reviewers, not just tests

Original: SWE-Bench style grading has been the standard for years now - you ask the agent to solve an issue and then run its code on a pre-constructed unit test.

Source: x.com

Writing ELI5 summary…