← back
arXivZhiben Chen, Youpeng Zhao, Yang Sui, Jun Wang, Yuzhang ShangTue, May 19, 2026, 10:59 AM PDT
score 16.5

Faster AI model inference by smartly managing memory and compute

Original: TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

Source: arxiv.org

Writing ELI5 summary…