← back
arXivYutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang, Furu WeiThu, Jun 4, 2026, 10:54 AM PDT
score 17.2

Faster long-context AI by reusing attention computations across layers

Original: You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Source: arxiv.org

Writing ELI5 summary…