arXivAnshun Asher Zheng, Kanishka Misra, David I. Beaver, Junyi Jessy LiMon, Jun 1, 2026, 10:51 AM PDT
score 16.6
Benchmark tests how well AI learns hidden rules from examples
Original: HERO'S JOURNEY: Testing Complex Rule Induction with Text Games
Source: arxiv.org ↗
Writing ELI5 summary…