x.comGabriel LespéranceSat, May 30, 2026, 9:09 AM PDT
score 15.8
4likes
Testing AI agents on real-world application and terminal tasks
Original: First in a two-part series where I throw RLMs at benchmarks and see how far they can go.
Source: x.com ↗
Writing ELI5 summary…