x.comHugo LarochelleSat, May 23, 2026, 7:59 AM PDT
score 15.2
8RT
Training loss predicts how language models scale on real tasks
Original: RT @sivareddyg: Nature is complex. Why would cross-entropy loss predict scaling behavior of language models on downstream task? Introducing…
Source: x.com ↗
Writing ELI5 summary…