arXivZeyuan Wang, Da Li, Yulin Chen, Yuehu Gong, Yanming Guo, Ye Shi, Liang Bai, Tianyuan Yu, Yanwei FuWed, May 20, 2026, 8:14 AM PDT
score 16.4
New policy method combines generative flexibility with trainable entropy for reinforcement learning
Original: \textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
Source: arxiv.org ↗
Writing ELI5 summary…