← back
arXivKyungmin Park, Taesup KimWed, Jun 3, 2026, 5:01 AM PDT
score 17.1

AI safety training fails mid-generation, requiring new alignment methods

Original: Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories

Source: arxiv.org

Writing ELI5 summary…