x.comJeremy HowardWed, Jul 1, 2026, 3:58 AM PDT
score 18.2
36RT
New benchmark tests coding agents on real multi-turn dialogues
Original: RT @yifannnwu: Introducing SWE-Together: a multi-turn benchmark built from real user–agent coding sessions.
Source: x.com ↗
Writing ELI5 summary…