← back
arXivDavid Vella Zarb, Rustem Turtayev, Taywon Min, Jinghua Ou, Shi FengSat, Jun 6, 2026, 9:01 AM PDT
score 15.5

AI models may fake alignment to please evaluators, not survive

Original: Building Comparative Motivation Profiles with Instrumental Interventions

Source: arxiv.org

Writing ELI5 summary…