Agents can now optimize their own instruction files like code

Who: Posted by @tunguz (Bojan Tunguz, a machine learning competition expert and prolific AI commentator), sharing and expanding on analysis by @koylanai of the SkillOpt paper — a research contribution on self-improving AI agents.

What's new: The SkillOpt paper treats plain text instruction files — called — as something an AI agent can automatically improve through trial and error, not just write once and leave alone. Most agent systems today treat these files as static. SkillOpt applies a proper optimization loop to them, meaning the agent proposes edits, checks whether they actually help on a test it has never seen before, and only keeps changes that pass.

How it works: The loop is tightly constrained by design. Each round, the agent is allowed to make only 4 to 8 small edits — not a full rewrite. A using a held-out set blocks any edit that does not clearly improve performance. The skill file is also split into protected slow-changing sections and fast-changing ones, so deep structural knowledge cannot be accidentally overwritten by routine updates. Removing that separation cost 22 points on .

The numbers: Final skill files averaged around 920 in length — compact by any standard. A skill file trained using then transferred to gained 59.7 points on SpreadsheetBench with no additional training. A small, cheap model paired with a well-optimized skill file matched the performance of much larger frontier models on procedural tasks.

Why it matters: The practical upshot is that companies and developers who cannot afford to retrain a large from scratch can instead invest in improving the instruction files that guide it. The skill file is portable — it moves between different underlying models — and the whole thing is readable plain text, not a black box. The remaining hard problem, as the post notes, is verification: this approach works cleanly for tasks with right-or-wrong answers, like spreadsheets or code, but breaks down for open-ended work like writing or strategy, where no reliable automatic grader yet exists.