arXivÁdám Kovács, Bowei He, Xue Liu, István Boros, Szilveszter Tóth, Gábor RecskiWed, Jul 1, 2026, 6:01 AM PDT
score 17.0
New benchmark detects AI hallucinations in code and tool outputs
Original: Beyond Document Grounding: Span-Level Hallucination Detection over Code, Tool Output, and Documents
Source: arxiv.org ↗
Writing ELI5 summary…