← back
x.comhardmaruWed, May 27, 2026, 7:52 AM PDT
score 17.9
1,740likes229RT46reply

Training deep networks one block at a time cuts memory use

Original: For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall.

Source: x.com

Writing ELI5 summary…