arXivYunhua Pei, Jingyu Hu, Yiwei Shi, Hongnan Ma, Weiru Liu, John CartlidgeMon, May 25, 2026, 10:38 AM PDT
score 16.5
New benchmark measures if AI models understand real market commitments
Original: StakeBench: Evaluating Language Understanding Grounded in Market Commitment
Source: arxiv.org ↗
Writing ELI5 summary…