x.comh100envyThu, Jul 2, 2026, 8:56 AM PDT
score 16.6
277likes34RT7reply
FlashInfer explains fast attention: pick pattern, generate optimized GPU kernel
Original: CMU PhD who built the kernels NVIDIA now ships in TensorRT-LLM explained fast attention in 68 minutes - better than $1200 GPU programming courses.
Source: x.com ↗
Writing ELI5 summary…