← back
arXivFeng Han, Zhixiong Zhang, Zheming Liang, Yibin Wang, Jiaqi WangThu, May 28, 2026, 10:27 AM PDT
score 14.8

Training method fixes vision-language models' bias toward text

Original: LoMo: Local Modality Substitution for Deeper Vision-Language Fusion

Source: arxiv.org

Writing ELI5 summary…