arXivKyungmin Park, Taesup KimWed, Jun 3, 2026, 5:01 AM PDT
score 17.1
AI safety training fails mid-generation, requiring new alignment methods
Original: Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories
Source: arxiv.org ↗
Writing ELI5 summary…