arXivÁdám Kovács, Bowei He, Xue Liu, István Boros, Szilveszter Tóth, Gábor RecskiWed, Jul 1, 2026, 6:01 AM PDT

score 17.0

New benchmark detects AI hallucinations in code and tool outputs

Original: Beyond Document Grounding: Span-Level Hallucination Detection over Code, Tool Output, and Documents

Writing ELI5 summary…