x.comXinyu YangThu, Jul 2, 2026, 9:25 AM PDT
score 16.2
25likes2RT1reply
New benchmark reveals how AI agents learn from long-term tasks
Original: Auto-research is getting a lot of attention, but what we should actually use it for and how to evaluate it are still very good questions.
Source: x.com ↗
Writing ELI5 summary…