arXivBoyi Zeng, Yiqin Hao, Zitong Wang, Shixiang Song, He Li, Feichen Song, Yifan Liu, Ziwei He, Xinbing Wang, Zhouhan LinWed, Jun 3, 2026, 8:33 AM PDT
score 16.5
Transformers reuse earlier layers more efficiently during inference
Original: Depth-Attention: Cross-Layer Value Mixing for Language Models
Source: arxiv.org ↗
Writing ELI5 summary…