arXivTom Jacobs, Rohan Jain, Rebekka BurkholzWed, May 20, 2026, 5:34 AM PDT
score 17.1
New optimizer method trains sparse transformer networks reliably
Original: HORST: Composing Optimizer Geometries for Sparse Transformer Training
Source: arxiv.org ↗
Writing ELI5 summary…