← back
arXivLawrence Keunho Jang, Mareks Woodside, Geronimo Carom, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan SalakhutdinovMon, Jun 8, 2026, 10:27 AM PDT
score 17.2

New iPhone benchmark tests AI agents with real user data

Original: iOSWorld: A Benchmark for Personally Intelligent Phone Agents

Source: arxiv.org

Writing ELI5 summary…