← back
arXivShayan Talaei, Abhinav Chinta, Devvrit Khatri, Amin Karbasi, Azalia Mirhoseini, Amin SaberiWed, Jul 1, 2026, 10:46 AM PDT
score 17.1

New method exposes hidden biases in language models by amplifying them

Original: Distill to Detect: Exposing Stealth Biases in LLMs through Cartridge Distillation

Source: arxiv.org

Writing ELI5 summary…

New method exposes hidden biases in language models by amplifying them · TinyNews · TinyNews