x.comTri DaoThu, May 21, 2026, 6:51 PM PDT
score 17.1
747likes75RT10reply
Transformer operations optimized by fusing computations into matrix multiplications
Original: After some mathematical rewrite, turns out all of transformer is a series of gemm + epilogue. Given a few optimized primitives, LLMs (and novice humans) can write speed-of-light kernels for all transf
Source: x.com ↗
Writing ELI5 summary…