arXivYuxiang Huang, Nuno M. T. Gonçalves, Federico Alvetreti, Lei Li, Xu Han, Edoardo M. Ponti, André F. T. Martins, Marcos V. TrevisoMon, May 18, 2026, 10:59 AM PDT
score 16.5
Smarter sparse attention method speeds up language models
Original: DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
Source: arxiv.org ↗
Writing ELI5 summary…