Using model internals to pick better training data for AI

Original: SAERL uses Sparse Autoencoders (the mechanistic interpretability tool) to drive LLM post-training data engineering from MODEL INTERNALS rather than external heuristics.

Source: x.com ↗

Writing ELI5 summary…