← back
arXivJiayu Wang, Weijiang Lv, Bowen Fu, Jing Fu, Jiayi Song, Lingyu Zhang, Lanxuan Xue, Luodi Chen, Zepeng Xin, Kaiyu Li, Xiangyong CaoFri, Jun 5, 2026, 10:13 AM PDT
score 15.5

Benchmark reveals AI agents still miss details human researchers catch

Original: Act As a Real Researcher: A Suite of Benchmarks Evaluating Frontier LLMs and Agentic Harnesses in Research Lifecycle

Source: arxiv.org

Writing ELI5 summary…