arXivPierre Boudart, Pierre Gaillard, Alessandro RudiTue, May 19, 2026, 5:39 AM PDT
score 17.1
Reinforcement learning algorithm provably optimal for logistic decision processes
Original: Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs
Source: arxiv.org ↗
Writing ELI5 summary…