arXivLin Qiu, Hanqing Zeng, Yao Liu, Bingjun Sun, Guangdeng Liao, Ji LiuSat, May 23, 2026, 5:38 PM PDT
score 15.8
Self-play training makes vision-language models reason better without labels
Original: DUEL: Adversarial Self-Play for Multimodal Reasoning
Source: arxiv.org ↗
Writing ELI5 summary…