x.comSaylorSun, May 31, 2026, 3:47 PM PDT
score 15.4
2likes2reply
Researcher questions whether new AI agent tests measure real weaknesses
Original: @cwolferesearch the failure case approach makes intuitive sense, but im curious how they avoid the new evals just matching cursor's current blindspots instead of general agent weaknesses
Source: x.com ↗
Writing ELI5 summary…