← back
x.comXiangyi LiTue, May 26, 2026, 10:53 AM PDT
score 15.9
18likes4RT1reply

New benchmark tests AI agents across diverse real-world tasks

Original: Great contribution to this field by adding richer domains and skills to agentic evals curated by experts @harvey

Source: x.com

Writing ELI5 summary…