arXivYifan Jiang, Ruoxi Ning, Sheng Yao, Freda ShiTue, May 26, 2026, 10:24 AM PDT
score 16.5
Real images often hurt vision-language model understanding
Original: Real Images, Worse Judgments: Evaluating Vision-Language Models on Concreteness and Imagery
Source: arxiv.org ↗
Writing ELI5 summary…