← back
arXivJianshu Zhang, Yijiang Li, Huifeixin Chen, Haoran Lu, Letian Xue, Bingyang Wang, Han LiuFri, May 22, 2026, 10:58 AM PDT
score 14.7

Vision-language models struggle to understand spatial numbers

Original: SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

Source: arxiv.org

Writing ELI5 summary…