x.comGarry TanTue, May 26, 2026, 10:32 AM PDT
score 16.0
836likes63RT31reply
DeepSWE benchmark reveals true differences between AI coding models
Original: This is the new standard for engineering evals https://t.co/UkEwUWybab
Source: x.com ↗
Writing ELI5 summary…