← back
arXivJason StaraceSun, Jun 7, 2026, 2:14 AM PDT
score 16.1

How AI scaffolding masks true model capability in benchmarks

Original: Scaffold Effects on GAIA: A Controlled Comparison

Source: arxiv.org

Writing ELI5 summary…