Most companies measure their AI investment by counting agents. It’s a natural metric. Agents have names, roles, and dashboards, which makes them look like a team you can put on an org chart.
The count is fine. The problem is that it describes only the visible surface.
Think of it like a company that hires ten managers before writing a single SOP. The roles are filled, the org chart looks organized, and the weekly standups are on the calendar. But nothing gets done consistently, because there are no procedures underneath. Each manager runs their own playbook. Each new situation is a one-off. The coordination cost grows with every hire.
That is what agent-heavy AI without skill depth looks like.
Agents Are Managers. Skills Are SOPs.
An agent is a role with its own identity, judgment, and accumulated context. It owns a responsibility end-to-end and makes decisions inside that responsibility. It coordinates, prioritizes, and calls other tools when it needs them.
A skill is a procedure. It describes how to run one specific task well, independent of who is running it. “Draft a customer refund explanation that cites the relevant policy section.” “Summarize a sales call transcript into the three commitments made.” “Generate a performance review from these twelve data points.”
Agents coordinate. Skills execute. One skill can be called by many agents, which means the operational knowledge lives in the skill and not inside any one agent’s private context.
This is the layer most companies are missing. They are hiring managers and not writing down the procedures those managers are supposed to run.
Why the Count Misleads
Counting agents tells you the number of roles that exist. It does not tell you how much of the actual work is captured in procedures the company can reuse.
A team with ten agents and thirty skills is operating at a different level from a team with ten agents and three skills. The count is identical. The capability is not.
The compounding math makes this gap grow over time. Each new documented skill can be called by every existing agent, which means each skill investment pays returns across the whole team. Each new agent, by contrast, starts with its own private context that must be built from scratch. Skills compound. Agents just add overhead.
McKinsey’s 2025 State of AI report makes this gap visible. The roughly 6% of organizations the report classifies as AI high performers are 2.8 times more likely than their peers to have fundamentally redesigned workflows around AI (55% vs 20%). The differentiator was not budget size or tool sophistication. It was whether operational knowledge lived in procedures the company could apply across workflows, or in agent-specific contexts that had to be rebuilt each time. The workflow redesign analysis walks through the same finding in the budget lane.
Count is what you measure. Depth is what you accumulate.
Renting vs. Owning Your AI IP
The second-order argument for skill depth is strategic, and it is the one that lands in the boardroom.
An off-the-shelf agent tied to one platform, one model, and one integration is something the company rents. The agent’s behavior is bound to the vendor’s stack. The moment any of the three changes, whether a platform update, a model retirement, or a vendor pricing shift, the work embedded in that agent has to be rebuilt. The company paid for the service but does not own the output of the relationship.
A well-defined skill is IP the company owns. It describes what should happen in plain language and handoff steps. It survives model upgrades, vendor switches, and team turnover because it is not bound to any one tool. A skill written to cover “generate a quarterly board summary from these inputs” can be executed by any competent model, on any platform, with roughly the same result.
The same structural issue surfaced in vendor consolidation: operational knowledge trapped inside tools the company does not control becomes a recurring coordination cost without producing any accumulated IP. Agent proliferation is the same pattern one architectural level down. Every agent tied to a specific vendor context is a piece of institutional knowledge the company is renting, not owning.
The companies that pull ahead over the next budget cycle are the ones whose operational knowledge lives in portable procedures, not in platform-specific agent contexts. Stanford’s April 2026 study of 51 enterprise deployments names it from the documentation side: the difference was never the model, it was always whether the operation was documented enough for AI to enact.
The CEO Playbook
Three moves shift the metric from count to depth.
Audit the ratio. Walk through the AI team and list every agent. Then list every documented, reusable skill. The ratio tells you how the system has actually been built. A team with many agents and few skills has been optimizing for roles. A team with fewer agents and many shared skills has been optimizing for procedures. The second team is the one that will keep working when the underlying technology shifts.
Ask vendors structural questions, not behavioral ones. “How portable are the skills I build on your platform?” invites a reassuring sales answer. The better question is structural: are the skills we build on your platform exportable as vendor-agnostic code or documents (YAML, JSON, markdown, plain Python) that we can run elsewhere, or are they locked inside your proprietary interface and unreadable outside it? The first version is a question about intent. The second forces a factual answer about format. The right vendor is the one whose answer to the second version is “exportable.”
Measure by reuse, not by role. The useful maturity metric is not the number of agents running. It is how often each skill gets called, by how many different agents, across how many workflows. A skill serving three agents across five workflows is compounding. A skill called once a quarter is overhead. This is the number that predicts whether the AI investment keeps producing value when the next model upgrade lands.
The Better Question
The shift the data keeps pointing toward is not fewer agents. It is deeper skills underneath the agents the company is already running.
Organizations that make this shift run lighter teams with higher output per role. They survive model upgrades without rewriting everything. They are not locked into any one vendor because their institutional knowledge lives in portable procedures. And when the next round of AI budget arrives, they convert it into capability instead of coordination cost.
The uncomfortable part is that skill depth is invisible from the outside. A CEO looking at a competitor can see the number of agents the competitor has announced. They cannot see the depth of the skill library that makes each agent effective. The temptation is to match the visible count. The better move is to out-engineer the invisible depth.
So before you ask how many agents your company needs, ask a better question: how deep is your skill library?