arXivMarcelo Sartori Locatelli, Fernando Tonucci, Jea Kwon, Luiz Felipe Vecchietti, Bryan Nathanael Wijaya, Cheng Yaw Low, Virgilio Almeida, Meeyoung ChaFri, Jun 5, 2026, 4:40 AM PDT
score 15.3
Language helps AI models understand where photos are taken
Original: Textual Supervision Enhances Geospatial Representations in Vision-Language Models
Source: arxiv.org ↗
Writing ELI5 summary…