← back
arXivKewei Xu, Junbo Qi, Yanyan Zou, Pengfei Zhang, Xingzhi Yao, Shengjie LiSat, Jun 6, 2026, 11:51 PM PDT
score 16.1

Smarter reward signals improve product recommendation models

Original: Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation

Source: arxiv.org

Writing ELI5 summary…