New benchmark tests coding agents through real multi-turn conversations

Original: Introducing SWE-Together: a multi-turn benchmark built from real user–agent coding sessions.

Writing ELI5 summary…