← back
arXivRebecca Ramnauth, Brian ScassellatiWed, May 27, 2026, 8:45 AM PDT
score 16.4

Language models hide banned concepts despite appearing suppressed

Original: The Attentional White Bear Effect in Transformer Language Models

Source: arxiv.org

Writing ELI5 summary…