← back
arXivYifan Jiang, Ruoxi Ning, Sheng Yao, Freda ShiTue, May 26, 2026, 10:24 AM PDT
score 16.5

Real images often hurt vision-language model understanding

Original: Real Images, Worse Judgments: Evaluating Vision-Language Models on Concreteness and Imagery

Source: arxiv.org

Writing ELI5 summary…