← back
arXivJungsoo Park, Hyungjoo Chae, Ethan Mendes, Jay DeYoung, Varsha Kishore, Wei Xu, Alan RitterTue, May 19, 2026, 10:43 PM PDT
score 16.9

Training language models to predict number ranges, not single points

Original: Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression

Source: arxiv.org

Writing ELI5 summary…