arXivJungsoo Park, Hyungjoo Chae, Ethan Mendes, Jay DeYoung, Varsha Kishore, Wei Xu, Alan RitterTue, May 19, 2026, 10:43 PM PDT
score 16.9
Training language models to predict number ranges, not single points
Original: Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
Source: arxiv.org ↗
Writing ELI5 summary…