← back
arXivVijeta Deshpande, Tootiya Giyahchi, Veena Padmanabhan, Leman Akoglu, Anna RumshiskyWed, May 27, 2026, 8:59 AM PDT
score 16.5

Activation steering struggles to generate diverse training data for safety classifiers

Original: Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection

Source: arxiv.org

Writing ELI5 summary…