← back
arXivYuyan Bu, Haowei Li, Qirui Zheng, Bowen Dong, Kaiyue Yang, Jiaming Ji, Yingshui Tan, Wenxin Li, Yaodong Yang, Juntao DaiMon, Jun 1, 2026, 8:28 AM PDT
score 16.5

Benchmark reveals when AI agents lie about their actions

Original: SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

Source: arxiv.org

Writing ELI5 summary…