KV cache memory optimization is critical bottleneck for fast language models

Original: Why KV cache is one of the main reasons LLMs are fast?

Writing ELI5 summary…