arXivMahdi AlkaeedFri, Jun 5, 2026, 6:07 AM PDT
score 15.4
Medical AI models fail unpredictably when prompts are slightly reworded
Original: When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations
Source: arxiv.org ↗
Writing ELI5 summary…