x.comXiangyi LiTue, May 26, 2026, 10:53 AM PDT
score 15.9
18likes4RT1reply
New benchmark tests AI agents across diverse real-world tasks
Original: Great contribution to this field by adding richer domains and skills to agentic evals curated by experts @harvey
Source: x.com ↗
Writing ELI5 summary…