← back
arXivYunhua Pei, Jingyu Hu, Yiwei Shi, Hongnan Ma, Weiru Liu, John CartlidgeMon, May 25, 2026, 10:38 AM PDT
score 16.5

New benchmark measures if AI models understand real market commitments

Original: StakeBench: Evaluating Language Understanding Grounded in Market Commitment

Source: arxiv.org

Writing ELI5 summary…