arXivLawrence Keunho Jang, Mareks Woodside, Geronimo Carom, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan SalakhutdinovMon, Jun 8, 2026, 10:27 AM PDT
score 17.2
New iPhone benchmark tests AI agents with real user data
Original: iOSWorld: A Benchmark for Personally Intelligent Phone Agents
Source: arxiv.org ↗
Writing ELI5 summary…