← back
x.comGabriel LespéranceSat, May 30, 2026, 9:09 AM PDT
score 15.8
4likes

Testing AI agents on real-world application and terminal tasks

Original: First in a two-part series where I throw RLMs at benchmarks and see how far they can go.

Source: x.com

Writing ELI5 summary…