← back
x.comSaylorSun, May 31, 2026, 3:47 PM PDT
score 15.4
2likes2reply

Researcher questions whether new AI agent tests measure real weaknesses

Original: @cwolferesearch the failure case approach makes intuitive sense, but im curious how they avoid the new evals just matching cursor's current blindspots instead of general agent weaknesses

Source: x.com

Writing ELI5 summary…