← back
x.comGarry TanTue, May 26, 2026, 10:32 AM PDT
score 16.0
836likes63RT31reply

DeepSWE benchmark reveals true differences between AI coding models

Original: This is the new standard for engineering evals https://t.co/UkEwUWybab

Source: x.com

Writing ELI5 summary…