← back
arXivPierre Boudart, Pierre Gaillard, Alessandro RudiTue, May 19, 2026, 5:39 AM PDT
score 17.1

Reinforcement learning algorithm provably optimal for logistic decision processes

Original: Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs

Source: arxiv.org

Writing ELI5 summary…