← back
arXivAndy Q Han, David J. Chalmers, Pavel IzmailovThu, May 28, 2026, 10:03 AM PDT
score 14.8

Language models contain built-in welfare axis that training activates

Original: How's it going? Reinforcement learning in language models recruits a functional welfare axis

Source: arxiv.org

Writing ELI5 summary…