arXivDavid Lindner, Victoria Krakovna, Sebastian FarquharThu, May 28, 2026, 10:56 AM PDT
score 15.6
1cites
New test framework checks if AI agents will sabotage their work
Original: Gram: Assessing sabotage propensities via automated alignment auditing
Source: arxiv.org ↗
Writing ELI5 summary…