arXivYaojie Zhang, Jianuo Huang, Junlong Ke, Yuhang Han, Yongji Long, Tianchen Zhao, Biqing Qi, Linfeng ZhangTue, May 19, 2026, 8:48 AM PDT
score 16.4
Faster LLM inference by smartly combining draft and verify strategies
Original: FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration
Source: arxiv.org ↗
Writing ELI5 summary…