arXivFeng Han, Zhixiong Zhang, Zheming Liang, Yibin Wang, Jiaqi WangThu, May 28, 2026, 10:27 AM PDT
score 14.8
Training method fixes vision-language models' bias toward text
Original: LoMo: Local Modality Substitution for Deeper Vision-Language Fusion
Source: arxiv.org ↗
Writing ELI5 summary…