← back
arXivZeyuan Wang, Da Li, Yulin Chen, Yuehu Gong, Yanming Guo, Ye Shi, Liang Bai, Tianyuan Yu, Yanwei FuWed, May 20, 2026, 8:14 AM PDT
score 16.4

New policy method combines generative flexibility with trainable entropy for reinforcement learning

Original: \textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

Source: arxiv.org

Writing ELI5 summary…