← back
arXivYongzhong XuMon, Jun 1, 2026, 8:26 AM PDT
score 16.5

When do attention circuits form in language models

Original: When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

Source: arxiv.org

Writing ELI5 summary…