arXivRebecca Ramnauth, Brian ScassellatiWed, May 27, 2026, 8:45 AM PDT
score 16.4
Language models hide banned concepts despite appearing suppressed
Original: The Attentional White Bear Effect in Transformer Language Models
Source: arxiv.org ↗
Writing ELI5 summary…