← back
arXivYaojie Zhang, Jianuo Huang, Junlong Ke, Yuhang Han, Yongji Long, Tianchen Zhao, Biqing Qi, Linfeng ZhangTue, May 19, 2026, 8:48 AM PDT
score 16.4

Faster LLM inference by smartly combining draft and verify strategies

Original: FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration

Source: arxiv.org

Writing ELI5 summary…