← back
arXivJiamin Chen, Yidi Wu, Qiexiang Wang, Qianben Chen, Yuchen Li, Yansen Zhang, Xiaokun Zhang, Wangchunshu Zhou, Chen MaThu, May 28, 2026, 8:46 AM PDT
score 14.7

Using AI to revive saturated benchmark scores through smarter evaluation

Original: SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge?

Source: arxiv.org

Writing ELI5 summary…