Training deep networks one block at a time cuts memory use

Original: For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall.

Source: x.com ↗

Writing ELI5 summary…