arXivRishabh Sabharwal, Hongru Wang, Amos Storkey, Jeff Z. PanMon, Jun 8, 2026, 10:08 AM PDT
score 17.2
Research AI agents struggle to improve from repeated feedback
Original: Multi-Turn Evaluation of Deep Research Agents Under Process-Level Feedback
Source: arxiv.org ↗
Writing ELI5 summary…