← back
arXivSweta Mahajan, Sukrut Rao, Jiahao Xie, Alexander Koller, Bernt SchieleFri, Jun 5, 2026, 9:54 AM PDT
score 15.5

Method aligns image and text embeddings in vision-language models

Original: TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

Source: arxiv.org

Writing ELI5 summary…