arXivRyan Bahlous-Boldi, Isha Puri, Idan Shenfeld, Akarsh Kumar, Mehul Damani, Sebastian Risi, Omar Khattab, Zhang-Wei Hong, Pulkit AgrawalThu, May 21, 2026, 10:59 AM PDT
score 14.8
Training language models to produce diverse solutions for search-based inference
Original: Vector Policy Optimization: Training for Diversity Improves Test-Time Search
Source: arxiv.org ↗
Writing ELI5 summary…