arXivKewei Xu, Junbo Qi, Yanyan Zou, Pengfei Zhang, Xingzhi Yao, Shengjie LiSat, Jun 6, 2026, 11:51 PM PDT
score 16.1
Smarter reward signals improve product recommendation models
Original: Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation
Source: arxiv.org ↗
Writing ELI5 summary…