arXivBlake Bullwinkel, Eugenia Kim, Amanda Minnich, Mark RussinovichMon, Jun 8, 2026, 9:21 AM PDT

score 17.1

Training AI models to attack and defend themselves simultaneously

Original: Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

Writing ELI5 summary…