arXivLinghao Kong, Megan Flynn, Michael Peng, Nir Shavit, Mark Kurtz, Alexandre MarquesThu, May 14, 2026, 9:45 AM PDT
score 9.2
Understanding speculative decoding performance in real LLM servers
Original: An Interpretable Latency Model for Speculative Decoding in LLM Serving
Source: arxiv.org ↗
Writing ELI5 summary…