arXivThomson Yen, Julian Poeltl, Harshith Srinivas Gear, Yilin Meng, Joshua Fan, Adam Shen, Yili Liu, Ali Bauyrzhan, Siri Du, Haoyang Liu, Daniel Guetta, Hongseok NamkoongThu, May 21, 2026, 9:06 AM PDT
score 14.8
New benchmark tests AI agents on real spreadsheet finance tasks
Original: WorkstreamBench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance
Source: arxiv.org ↗
Writing ELI5 summary…