arXivZhangchen Xu, Junda Chen, Yue Huang, Dongfu Jiang, Jiefeng Chen, Hang Hua, Zijian Wu, Zheyuan Liu, Zexue He, Lichi Li, Shizhe Diao, Jiaxin Pei, Jinsung Yoon, Hao Zhang, Mengdi Wang, Radha Poovendran, Misha Sra, Alex Pentland, Zichen ChenWed, Jun 3, 2026, 9:36 AM PDT
score 17.7
1HN
New benchmark tests if AI can improve code and experiments over hours
Original: AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?
Source: arxiv.org ↗
Writing ELI5 summary…