← back
arXivLinghao Kong, Megan Flynn, Michael Peng, Nir Shavit, Mark Kurtz, Alexandre MarquesThu, May 14, 2026, 9:45 AM PDT
score 9.2

Understanding speculative decoding performance in real LLM servers

Original: An Interpretable Latency Model for Speculative Decoding in LLM Serving

Source: arxiv.org

Writing ELI5 summary…