← back
arXivManjiang Yu, Hongji Li, Junwei Chen, Xue Li, Priyanka Singh, Yang Cao, Lijie HuWed, May 27, 2026, 9:39 AM PDT
score 16.5

Adaptive safety corrections for language models without retraining

Original: Multi-Adapter Representation Interventions via Energy Calibration

Source: arxiv.org

Writing ELI5 summary…