← back
arXivZhangchen Xu, Junda Chen, Yue Huang, Dongfu Jiang, Jiefeng Chen, Hang Hua, Zijian Wu, Zheyuan Liu, Zexue He, Lichi Li, Shizhe Diao, Jiaxin Pei, Jinsung Yoon, Hao Zhang, Mengdi Wang, Radha Poovendran, Misha Sra, Alex Pentland, Zichen ChenWed, Jun 3, 2026, 9:36 AM PDT
score 17.7
1HN

New benchmark tests if AI can improve code and experiments over hours

Original: AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Source: arxiv.org

Writing ELI5 summary…