arXivRichard J. Young, Gregory D. MoodyWed, May 27, 2026, 9:55 AM PDT
score 16.5
New benchmark measures whether coding AI refuses to write malware
Original: Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests
Source: arxiv.org ↗
Writing ELI5 summary…