RLHF masks partisan bias without erasing it from language models

Original: The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

Writing ELI5 summary…