A CEO is two weeks from signing the next AI budget.
Engineering wants the newest frontier model. Operations wants a year of investment in the layer around the model: context files, feedback logs, workflow definitions, examples, and ownership.
Both are right. Both are wrong if applied to the wrong workflow.
The missing instrument is the diagnostic that tells the room which.
The Distinction the Budget Conversation Skips
Most AI workflows inside an operating company fall into one of two classes, and they respond to different investments.
The first class is friction-bound. The model is already capable enough to do the work, but the work is repeatable, format-sensitive, context-heavy, and prone to drift across sessions. The bottleneck is not raw intelligence. The bottleneck is that the model arrives at every task without the team’s prior corrections, without the company’s house style, without the institutional knowledge that turns a competent draft into a usable one. A bigger model does not solve any of that, because the missing pieces are not capability gaps. They are memory gaps. The cure is the harness, the layer of files and logs and workflows the AI reads before it begins.
The second class is capability-bound. The work itself is beyond the model’s current ceiling, and no amount of harness will change that. Novel scientific reasoning under high uncertainty. Long-horizon autonomous discovery. Decisions that depend on synthesizing tradeoffs the model has not seen patterns for. Here a bigger model is the lever, and the harness around a weaker model cannot substitute for the capability it does not have.
The two classes look similar from the outside. Both produce AI output. Both consume budget. The distinction matters because the remedy is different, and the remedy a CEO funds without the diagnostic is usually the more expensive one.
What the Source Layer Already Says
Ethan Mollick described the boundary this February in his Substack “One Useful Thing,” writing that “the model differences are now small enough that the app and harness matter more than the model.” The framing was deliberate. Mollick was not arguing that capability stopped mattering. He was arguing that across most current production workflows, the differences between frontier models have compressed to the point where the harness layer, the way the model is wrapped and contextualized and corrected, accounts for more of the visible output quality than which frontier model is underneath.
The compression Mollick names is the empirical condition for the friction-bound class to exist. When the model differences were large, the harness was a secondary investment. The model itself moved the result enough to justify the upgrade cycle. When the model differences become small, the layer around the model takes over as the primary lever, because that layer is the only place where the missing pieces (context, memory, corrections, format discipline) can actually live.
Mollick’s hedge is the boundary itself. “Now small enough that the app and harness matter more than the model” is a claim about most workflows, not all of them.
What Friction-Bound Looks Like in Operation
A B2B content workflow ran for one week looking like most AI workflows. The model produced output. Roughly 40% of the pieces came back rejected, each one absorbing about 10 to 20 minutes of cleanup.
Four weeks later the same team, on the same model, doing the same work, was rejecting about 15% of pieces and spending closer to 2 minutes on each one. The model never moved. A single document changed. The team built a shared learning asset, a file the AI was told to read before starting any task, capturing what worked, what didn’t, and the corrections from every previous session. Each new run inherited everything the team had already taught it.
This is the friction-bound case at full resolution. The bottleneck was not model intelligence. The bottleneck was that each session started without the team’s accumulated corrections, so the model kept producing the same near-misses that someone had already fixed in a prior session. The harness, in the most minimal possible form, was one markdown file. On this narrow workflow, the team saw a roughly fivefold improvement in cleanup time on rejected pieces, without a model upgrade and without additional vendor spend.
A CEO who funds a frontier model upgrade for a workflow shaped like this one is paying for capability the workflow already had. The unfunded variable is the harness, and the unfunded variable is where the multiplier lived.
What Capability-Bound Looks Like at the Other End
The closer business example sits inside any company that already runs both classes side by side.
A legal team asking an AI to summarize standard contracts is doing friction-bound work. The model can already read the contract. The team’s preferred risk-flagging conventions, redline format, and house phrasing are the missing context. A learning asset that captures those conventions closes the gap, and the work compounds across every contract that comes after.
That same legal team asking the model to reason through a novel cross-border regulatory structure is doing capability-bound work. The bottleneck is the model’s ceiling on multi-jurisdictional inference under ambiguity, not the team’s house style. Better context will not raise that ceiling. The remedy, if there is one, comes from a more capable model.
Same team. Same vendor. Two different bottlenecks. Two different remedies. The asymmetry that surfaces inside a single legal function is the asymmetry the diagnostic has to name across the company.
Dario Amodei’s 2024 essay “Machines of Loving Grace” names the capability-bound class at its outer limit. Amodei’s central prediction is that “AI-enabled biology and medicine will allow us to compress the progress that human biologists would have achieved over the next 50-100 years into 5-10 years.” The capability requirement to do that work is not subtle. Amodei writes “I’m not talking about AI as merely a tool to analyze data … I’m talking about using AI to perform, direct, and improve upon nearly everything biologists do.” The model has to operate autonomously over long horizons, design experiments, direct the work of human researchers, and make discoveries the training data did not contain in pre-digested form.
Amodei’s frame is the capability-bound class made explicit. A harness around a weaker model does not substitute for the underlying intelligence required to do that work, because the gap is in the model’s reasoning ceiling, not in the model’s context. Every harness improvement reduces friction. None of them raise the ceiling.
Most operating companies are not doing autonomous biological discovery. The capability-bound class exists, and the article is not denying it. The point is the asymmetry. The capability-bound class is where the next frontier model delivers work that was not previously possible. The friction-bound class is where the harness delivers work the model could already do but kept doing badly. Funding one as if it were the other is the failure mode the budget meeting cannot diagnose without an explicit category for each.
The Mechanism Behind the Different Remedies
The remedies diverge because the bottlenecks live at different layers.
In the friction-bound class, the bottleneck is between the model and the institutional context. Every session is a cold start. The corrections from session 47 are not visible in session 48 unless someone wrote them down in a place the model reads. The cure is to give the model that place. The model itself stays the same.
In the capability-bound class, the bottleneck is inside the model. The reasoning required is past the model’s current depth. The cure is a more capable model, not a richer context. Adding more context to a model that cannot perform the underlying inference produces a longer transcript of the same inadequacy.
The cost asymmetry follows the mechanism. A harness build is a person and a process. A frontier model upgrade is a vendor commitment, often a multi-quarter contract, often with concentration risk on a single provider whose own infrastructure costs are climbing. Anthropic disclosed a $30 billion run-rate this spring with more than 1,000 business customers each spending over $1 million on an annualized basis. The bill for the capability-bound side is visible. The bill for the harness side, when there is one, is usually a single salary that no AI line item carries.
The harness also carries a maintenance cost the friction-bound exemplar has not had time to measure. A four-week window proves the loop works. A twelve-month window proves whether the corrections age cleanly or rot as models, formats, and team composition shift around them. No public source isolates that maintenance curve. The honest version of the diagnostic asks which class of work the budget is funding capability for, and then asks the team to measure the harness payback against its own decay rather than assume the four-week shape extends linearly.
The decision under the diagnostic is not “model or harness.” It is “which class of work am I funding capability for, and what is the actual constraint on that class.” If the answer is friction-bound, the harness is the cheaper and faster lever, and the model upgrade buys overhead. If the answer is capability-bound, the model upgrade is what raises the ceiling, and the harness alone will leave the work unbuilt. The error is funding the wrong remedy for the diagnosed class, not funding either remedy in isolation.
What Goes Into the Policy on Monday
The smallest version of the policy change is one page.
For every AI workflow above a defined stakes threshold, classify it as friction-bound or capability-bound before any model-side or harness-side investment is approved.
The CEO Diagnostic. Before approving the spend, ask:
- Is the model failing because it lacks reasoning ability, or because it lacks our context?
- Has the team already corrected this type of failure before?
- Could a file, example bank, checklist, or workflow definition prevent the failure next time?
- Would a better model solve the failure without additional company context?
- Who owns the improvement loop after the workflow goes live?
A workflow that answers yes to questions 2 and 3 is friction-bound, and the investment belongs in the harness. A workflow that answers yes only to question 4 is capability-bound, and the investment belongs in the model. Question 5 is the one a CEO cannot delegate, because without an owner the diagnostic runs once and the budget gets signed on whichever recommendation was made loudest in the room.
The diagnostic does not need new tooling. It needs one named owner for the harness layer, the way the skills-beat-agents argument named codified workflows as the IP that survives vendor switches. It needs a budget line for that layer, the way the four allocation rules named training as the underweighted variable inside the AI budget. And it needs the classification step to run before, not after, the allocation decision.
The CEO in the scene at the top of this article was choosing between two recommendations because the room did not yet have language for the choice underneath them. Engineering was arguing for a capability-bound remedy. Operations was arguing for a friction-bound one. Both could be right for different workflows. The decision the room could not make without the diagnostic was which workflows belonged in which class.
The real question for the AI budget is not how much to spend on the model versus the harness. It is which workflows are friction-bound, which are capability-bound, and who in the company is responsible for telling the two apart before the budget is signed.
The harness compounds where the bottleneck is friction. The model compounds where the bottleneck is capability. The cost of confusing them is the budget you cannot get back.
Questions this article gets
How is friction-bound different from 'just needs better prompting'?
Prompt engineering shapes what the model produces inside a single session. The harness shapes what every session inherits from every prior session: corrections, conventions, examples, the team's accumulated quality bar. A workflow that 'just needs better prompting' still resets to zero on the next prompt, because the prompt lives inside the session and disappears when the session ends. A harness-fed workflow does not reset, because the file the model reads before working carries the team's prior fixes forward. The fivefold cleanup-time improvement in the friction-bound example did not come from a sharper prompt. It came from a file the model was told to read before any prompt. Better prompts compound inside a session; the harness compounds across them. The distinction matters because the budget decision is whether to pay for a better single session or for a layer that survives every session.
Doesn't a more capable model also solve the friction-bound class?
Sometimes, but at higher cost and with no guarantee of durability. A more capable model can compress the team's accumulated context less crudely than a weaker one, which lowers the visible friction. But the model upgrade does not change the underlying mechanic: the next session still arrives without the corrections the team made in the previous one. The next vendor switch resets the gain entirely. The harness is the only investment that lives outside the model and survives the model change. A CEO paying for the model upgrade alone is renting a temporary fix to a problem the harness would solve permanently, and paying a vendor concentration premium for the privilege.
Who in the company should own the harness layer?
Not engineering, and not the AI vendor's account team. The owner is typically someone with deep workflow knowledge and editorial discipline: an operations lead, a head of content, or a senior workflow owner inside the function the AI supports. The work is curating what the model reads before each task, capturing corrections and conventions, and pruning files that no longer reflect how the team works. It is closer to running a style guide than to engineering. The accountability point in the article is not the title but the existence of a named owner. Without one, the diagnostic runs once, the file ages, the corrections rot, and the budget conversation defaults back to whichever recommendation was made loudest in the room.