← back
arXivXingzhou Pang, Yifan Hou, Junling Wang, Mrinmaya SachanThu, May 28, 2026, 9:20 AM PDT
score 14.8

Vision-language models fail at counting beyond training data

Original: Unveiling the Visual Counting Bottleneck in Vision-Language Models

Source: arxiv.org

Writing ELI5 summary…