← back
arXivYanjing Ren, Reza Ebrahimi, TengTeng MaWed, Jun 3, 2026, 6:33 AM PDT
score 16.5

New benchmark tests AI chatbots' ability to detect unsafe conversations

Original: AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety

Source: arxiv.org

Writing ELI5 summary…