← back
x.comJeremy HowardWed, Jul 1, 2026, 3:58 AM PDT
score 18.2
36RT

New benchmark tests coding agents on real multi-turn dialogues

Original: RT @yifannnwu: Introducing SWE-Together: a multi-turn benchmark built from real user–agent coding sessions.

Source: x.com

Writing ELI5 summary…