arXivBlake Bullwinkel, Eugenia Kim, Amanda Minnich, Mark RussinovichMon, Jun 8, 2026, 9:21 AM PDT
score 17.1
Training AI models to attack and defend themselves simultaneously
Original: Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO
Source: arxiv.org ↗
Writing ELI5 summary…