← back
arXivDavid Lindner, Victoria Krakovna, Sebastian FarquharThu, May 28, 2026, 10:56 AM PDT
score 15.6
1cites

New test framework checks if AI agents will sabotage their work

Original: Gram: Assessing sabotage propensities via automated alignment auditing

Source: arxiv.org

Writing ELI5 summary…