arXivXingzhou Pang, Yifan Hou, Junling Wang, Mrinmaya SachanThu, May 28, 2026, 9:20 AM PDT
score 14.8
Vision-language models fail at counting beyond training data
Original: Unveiling the Visual Counting Bottleneck in Vision-Language Models
Source: arxiv.org ↗
Writing ELI5 summary…