arXivYuhao Shen, Tianyu Liu, Xinyi Hu, Quan Kong, Baolin Zhang, Jun Dai, Jun Zhang, Shuang Ge, Lei Chen, Yue Li, Mingcheng Wan, Cong WangTue, May 19, 2026, 9:55 AM PDT
score 16.4
Hybrid token retrieval speeds up language model inference
Original: Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding
Source: arxiv.org ↗
Writing ELI5 summary…