← back
arXivMahdi AlkaeedFri, Jun 5, 2026, 6:07 AM PDT
score 15.4

Medical AI models fail unpredictably when prompts are slightly reworded

Original: When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations

Source: arxiv.org

Writing ELI5 summary…