← back
Hacker NewsyakkomajuriMon, May 18, 2026, 6:30 PM PDT
score 30.0
474HN341HN cmts

Six months of AI model releases and coding breakthroughs explained

Original: The last six months in LLMs in five minutes

Source: simonwillison.net

Who: Posted to Hacker News by yakkomajuri; the content is a set of annotated lightning-talk slides authored by Simon Willison (independent developer, creator of Datasette, prolific LLM commentator) from his five-minute talk at PyCon US 2026. The piece is a fast-paced personal narrative covering roughly November 2025 through May 2026.

What's new: Willison identifies two major shifts over the period. First, AI-assisted coding tools crossed a quality threshold in November 2025 where they became reliable enough for daily professional use, not just occasional assistance. OpenAI and Anthropic had spent most of 2025 applying to their coding-focused products, and the results became visible that month. Second, models small enough to run on a personal laptop have surpassed earlier expectations for quality, with a 20.9 GB open-weights model from Qwen3 beating frontier results on Willison's informal pelican-drawing benchmark.

How it works: The piece uses a deliberately whimsical test prompt — draw an SVG of a pelican riding a bicycle — as a longitudinal proxy for general model capability. The logic is that the task is compositionally hard, visually verifiable, and almost certainly absent from any targeted training. Willison tracks which provider held the informal "best model" crown across the period: it changed hands five times among , OpenAI, and Google across models including Claude Opus 4.5, GPT-5.1 Codex Max, and Gemini 3. The open-weights frontier was pushed by GLM-5.1, a 1.5-trillion-parameter model from Chinese lab GLM, and the Qwen3 series from Alibaba's research group.

The numbers: The leadership in model quality changed five times in roughly six months. The Qwen3 open-weights model Willison ran locally is 20.9 GB. GLM-5.1 weighs 1.5 trillion parameters, making it runnable only on substantial server hardware. The series from Google is described as the most capable open-weights release from a US lab to date.

Why it matters: The convergence of two trends — coding agents becoming genuinely reliable and locally runnable models becoming surprisingly strong — is practically significant for working developers. The shift in coding agents is not incremental; Willison describes it as crossing a threshold from "often works" to "mostly works," which changes whether a developer integrates the tool into daily workflow or treats it as an occasional curiosity. The rise of strong local models matters because it reduces dependence on hosted APIs for many tasks, lowering cost and latency and raising privacy options. Willison also briefly traces the emergence of a new product category he calls , personal AI assistants significant enough that Mac Minis were reportedly selling out in Silicon Valley as people bought them to run the software locally.