arXivSweta Mahajan, Sukrut Rao, Jiahao Xie, Alexander Koller, Bernt SchieleFri, Jun 5, 2026, 9:54 AM PDT
score 15.5
Method aligns image and text embeddings in vision-language models
Original: TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment
Source: arxiv.org ↗
Writing ELI5 summary…