arXivLinghao Kong, Megan Flynn, Michael Peng, Nir Shavit, Mark Kurtz, Alexandre MarquesThu, May 14, 2026, 9:45 AM PDT

score 9.2

Understanding speculative decoding performance in real LLM servers

Original: An Interpretable Latency Model for Speculative Decoding in LLM Serving

Writing ELI5 summary…