← back
arXivMohammed Alshaalan, Miguel R. D. RodriguesTue, May 19, 2026, 8:15 AM PDT
score 16.4

New detector spots sneaky jailbreak attempts hiding in normal-sounding text

Original: Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes

Source: arxiv.org

Writing ELI5 summary…