Nearly 9 in 10 AI agent pilots never reach production — and, contrary to the popular story, not because companies picked a "weak" model. Something is missing that in 2026 earned its own name: the harness. A harness is the layer you build around the model — rules, tools, permission boundaries, checking mechanisms and monitoring — and it, not the model itself, increasingly decides whether AI makes money. In this piece I explain what a harness is, where it came from, and why the concept is worth understanding regardless of company size.
Why the market only looked at the model
For the past two years the market lived by the question of which model is best. Understandable — models were impressive and it was easy to believe that picking the right one would make the rest fall into place. But the technology matured in a different direction than the headlines suggested.
One experiment shows it well. The LangChain team improved their coding agent's effectiveness by nearly 14 percentage points without changing the model even once. They changed only what surrounds it: self-checking loops, how information is fed in, and error-catching mechanisms. In other words — same engine, different harness, completely different result.
Where the harness came from and why a new name
The term was popularized in early 2026 by Mitchell Hashimoto — co-creator of Terraform, a man rooted in infrastructure, not AI marketing. He described a simple principle: when an agent makes a mistake, don't fix the single case — build a solution around it so it never makes that mistake again. The word harness literally means a frame or rigging — I keep it in the original, because that's how it's spoken about in practice. Hence the formula that became loud: Agent = Model + Harness.
In fairness, some practitioners frown. Testing, monitoring and permission control existed in software engineering long before this term. They're right about the substance. But the name has value — it gives boards a shared word for work that until now got lost somewhere between "IT" and "innovation," and that can now be deliberately planned and funded.
In practice a harness is a few layers: the set of rules the agent reads at start, access to tools, permission boundaries, verification loops, and monitoring. Claude Code, Cursor, GitHub Copilot or Codex are exactly today's ready-made harnesses around models — a whole working environment, not a "chat with a model."
Two harnesses — and only one is your advantage
Here comes the distinction that matters most for a decision-maker. Some harnesses are built by model providers — built-in safety, tool handling, context management. That's the layer that quickly becomes standard and that you get in the price of the tool. No point recreating it.
The second layer you build yourself — for your specific process. Your rules, your data, your safeguards, your definition of what "done well" means. And it's exactly this layer that is the real advantage, because it records your organization's knowledge and with each fix becomes harder to copy. You can swap the model like an engine. You can't swap this layer.
Hence a simple conclusion: hand the first harness to the provider. Don't hand the second to a one-off consulting project, because it's where the company's institutional memory lives.
What actually works
Research shows the same pattern from different angles. Code produced by an agent without supervision scored poorly on maintainability — regardless of the model used. The same code with human supervision and a control layer scored several times better. The difference was made not by the model but by what surrounded it: boundaries, tests, scope control. It's not technology magic, just the effect of a mature process.
My takeaway for boards and owners
Regardless of company size the principle is the same. Stop deciding which model to standardize on — bet on a portable layer of rules and treat the model as a swappable engine. The whole difference lies in what you build around it, and that's the layer worth investing in first, because it's the one that catches errors when the model is wrong.
Scale changes only the weight of this work, not its direction. In a large organization your own harness is a multi-year asset, not a one-off consulting project — and it needs a person who owns it; in firms that take it seriously, such a role has become the norm. In a small company the same harness fits in one rules file describing your process, one list of what the agent must not do, one mechanism checking the result, and a simple record of what happened. That's hours of work, not a big rollout. Hence the simple conclusion for a smaller company: don't buy "agentic transformation" from a provider who promises everything — use the ready harness that tools like Claude Code, Cursor or Copilot give, and add your own thin layer for one specific process. Preferably one where it's easy to check whether the result is good — usually marketing, sales, or customer service.
There's one more reason to start now, and it concerns both groups. From August 2026 the EU's requirements for high-risk AI systems start to apply — including in recruitment, scoring, or employee monitoring. The deadline is sometimes pushed in regulatory debates, but planning for the earlier date is simply safer. A layer of verification, logging, and human oversight is, incidentally, compliance infrastructure, not just innovation. And every regulator will ask about one thing regardless of company size: a short AI usage policy for the team.
Don't ask: "Which AI model should we choose?" Ask instead: "What will we build around the model so it can be relied on?" Because before a model starts making money, someone has to build around it a layer that keeps it in check.
Dear Reader. If you believe the topic above applies to your company and you'd like to talk with me in the Board's presence about how to build such a layer around AI in your organization's reality, get in touch. Leszek Giza.
