arXivGuanhua Chen, Chuyue Huang, Yutong Yao, Shudong Liu, Xueqing Song, Lidia S. Chao, Derek F. WongThu, May 14, 2026, 9:20 AM PDT
score 9.2
Better visual search by breaking images into smaller, verifiable pieces
Original: From Scenes to Elements: Multi-Granularity Evidence Retrieval for Verifiable Multimodal RAG
Source: arxiv.org ↗
Writing ELI5 summary…