arXivJiale Amber Wang, Kaiyuan Wang, Pengyu NieThu, Jul 2, 2026, 10:35 AM PDT
score 17.1
New benchmark tests if AI agents can update software tests alongside code changes
Original: TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution
Source: arxiv.org ↗
Writing ELI5 summary…