LLM agents rely on memorized patterns, not feedback, for code optimization

Who: Authored by Dmitry Redko, Albert Fazlyev, Konstantin Sozykin, Maria Ivanova, Evgeny Burnaev, and Egor Shvetsov, primarily from the Applied AI Institute and ITMO University in Russia, with one author from YSDA (Yandex School of Data Analysis). The paper is posted on arXiv.

What's new: Researchers ran three controlled experiments to test whether agents actually learn from feedback when trying to optimize code, or whether they are just replaying patterns baked in during training. The answer is mostly the latter: these models lean heavily on what they already know rather than genuinely adapting to new information given during a task.

How it works: The team put several through a propose-evaluate-revise loop — roughly, the model suggests a piece of code, gets a score back, then tries again. They tested two coding languages for the task: and . They also tested whether telling the model the size of the data it was working with changed its behavior.

The numbers: In the feedback loop experiments, models optimizing in CUDA improved steadily with each round of feedback. Models working in TVM IR got worse over iterations, not better. When asked to handle uncommon data sizes, performance dropped sharply regardless of which coding language was used. Telling the model explicitly what data size to expect made no measurable difference — the model produced essentially the same output either way.

Why it matters: This research punctures a popular assumption: that giving an structured feedback makes it smarter at the task over time. It turns out the model succeeds mainly when the coding language is one it has seen a lot of during training, and struggles or regresses when working in unfamiliar territory. The "agentic loop" wrapper adds less than advertised — the underlying training data is doing most of the real work.