arXivJianshu Zhang, Yijiang Li, Huifeixin Chen, Haoran Lu, Letian Xue, Bingyang Wang, Han LiuFri, May 22, 2026, 10:58 AM PDT
score 14.7
Vision-language models struggle to understand spatial numbers
Original: SPACENUM: Revisiting Spatial Numerical Understanding in VLMs
Source: arxiv.org ↗
Writing ELI5 summary…