Most companies measure their AI investment by counting agents. It’s a natural metric. Agents have names, roles, and dashboards, which makes them look like a team you can put on an org chart.
The count is fine. The problem is that it describes only the visible surface.
Think of it like a company that hires ten managers before writing a single SOP. The roles are filled, the org chart looks organized, and the weekly standups are on the calendar. But nothing gets done consistently, because there are no procedures underneath. Each manager runs their own playbook. Each new situation is a one-off. The coordination cost grows with every hire.
That is what agent-heavy AI without skill depth looks like.
Agents Are Managers. Skills Are SOPs.
An agent is a role with its own identity, judgment, and accumulated context. It owns a responsibility end-to-end and makes decisions inside that responsibility. It coordinates, prioritizes, and calls other tools when it needs them.
A skill is a procedure. It describes how to run one specific task well, independent of who is running it. “Draft a customer refund explanation that cites the relevant policy section.” “Summarize a sales call transcript into the three commitments made.” “Generate a performance review from these twelve data points.”
Agents coordinate. Skills execute. One skill can be called by many agents, which means the operational knowledge lives in the skill and not inside any one agent’s private context.
This is the layer most companies are missing. They are hiring managers and not writing down the procedures those managers are supposed to run.
Why the Count Misleads
Counting agents tells you the number of roles that exist. It does not tell you how much of the actual work is captured in procedures the company can reuse.
A team with ten agents and thirty skills is operating at a different level from a team with ten agents and three skills. The count is identical. The capability is not.
The compounding math makes this gap grow over time. Each new documented skill can be called by every existing agent, which means each skill investment pays returns across the whole team. Each new agent, by contrast, starts with its own private context that must be built from scratch. Skills compound. Agents just add overhead.
McKinsey’s 2025 State of AI report makes this gap visible. The roughly 6% of organizations the report classifies as AI high performers are 2.8 times more likely than their peers to have fundamentally redesigned workflows around AI (55% vs 20%). The differentiator was not budget size or tool sophistication. It was whether operational knowledge lived in procedures the company could apply across workflows, or in agent-specific contexts that had to be rebuilt each time. The workflow redesign analysis walks through the same finding in the budget lane.
Count is what you measure. Depth is what you accumulate.
Renting vs. Owning Your AI IP
The second-order argument for skill depth is strategic, and it is the one that lands in the boardroom.
An off-the-shelf agent tied to one platform, one model, and one integration is something the company rents. The agent’s behavior is bound to the vendor’s stack. The moment any of the three changes, whether a platform update, a model retirement, or a vendor pricing shift, the work embedded in that agent has to be rebuilt. The company paid for the service but does not own the output of the relationship.
A well-defined skill is IP the company owns. It describes what should happen in plain language and handoff steps. It survives model upgrades, vendor switches, and team turnover because it is not bound to any one tool. A skill written to cover “generate a quarterly board summary from these inputs” can be executed by any competent model, on any platform, with roughly the same result.
The same structural issue surfaced in vendor consolidation: operational knowledge trapped inside tools the company does not control becomes a recurring coordination cost without producing any accumulated IP. Agent proliferation is the same pattern one architectural level down. Every agent tied to a specific vendor context is a piece of institutional knowledge the company is renting, not owning.
The companies that pull ahead over the next budget cycle are the ones whose operational knowledge lives in portable procedures, not in platform-specific agent contexts. Stanford’s April 2026 study of 51 enterprise deployments names it from the documentation side: the difference was never the model, it was always whether the operation was documented enough for AI to enact.
The CEO Playbook
Three moves shift the metric from count to depth.
Audit the ratio. Walk through the AI team and list every agent. Then list every documented, reusable skill. The ratio tells you how the system has actually been built. A team with many agents and few skills has been optimizing for roles. A team with fewer agents and many shared skills has been optimizing for procedures. The second team is the one that will keep working when the underlying technology shifts.
Ask vendors structural questions, not behavioral ones. “How portable are the skills I build on your platform?” invites a reassuring sales answer. The better question is structural: are the skills we build on your platform exportable as vendor-agnostic code or documents (YAML, JSON, markdown, plain Python) that we can run elsewhere, or are they locked inside your proprietary interface and unreadable outside it? The first version is a question about intent. The second forces a factual answer about format. The right vendor is the one whose answer to the second version is “exportable.”
Measure by reuse, not by role. The useful maturity metric is not the number of agents running. It is how often each skill gets called, by how many different agents, across how many workflows. A skill serving three agents across five workflows is compounding. A skill called once a quarter is overhead. This is the number that predicts whether the AI investment keeps producing value when the next model upgrade lands.
The Better Question
The shift the data keeps pointing toward is not fewer agents. It is deeper skills underneath the agents the company is already running.
Organizations that make this shift run lighter teams with higher output per role. They survive model upgrades without rewriting everything. They are not locked into any one vendor because their institutional knowledge lives in portable procedures. And when the next round of AI budget arrives, they convert it into capability instead of coordination cost.
The uncomfortable part is that skill depth is invisible from the outside. A CEO looking at a competitor can see the number of agents the competitor has announced. They cannot see the depth of the skill library that makes each agent effective. The temptation is to match the visible count. The better move is to out-engineer the invisible depth.
So before you ask how many agents your company needs, ask a better question: how deep is your skill library?
Questions this article gets
What's the practical difference between an AI agent and an AI skill?
An agent is a manager. It owns a role end-to-end (customer onboarding, ticket triage, invoice review) and carries its own context and decision authority. A skill is a Standard Operating Procedure. It's a known-good way of running one specific task, documented as a procedure that any agent can call. Agents coordinate. Skills execute. The same skill can be called by multiple agents across different workflows, which means the company's operational knowledge lives in the skill, not locked inside any one agent's private context. Counting agents tells you how many managers you have. Counting skills, and how deep each one goes, tells you how much of the real work is captured in reusable procedures.
Why do agent-heavy AI systems create vendor lock-in?
Agents accumulate context in ways that are specific to the model and platform they run on. After months of operation, an agent's behavior reflects the quirks of its underlying model, the integration patterns of its platform, and the prompts tuned for its role. Moving that agent to a different vendor means recreating the context, re-tuning the prompts, and re-validating the behavior. Skills, when documented as portable procedures, survive these transitions. They describe what should happen, not which tool happens to do it today. A skill-heavy system can swap underlying models with limited disruption; an agent-heavy system cannot. This is why skill depth is the strategic asset and agent count is the liability.
How should a CEO audit whether their AI team has enough skill depth?
Three audit questions surface the real architecture. First, when a new workflow need arrives, does the team default to adding a new agent or extending existing skills? Teams that default to new agents are scaling with headcount; teams that extend skills are scaling with the organization. Second, if a vendor or model changes tomorrow, how much of the work has to be rebuilt? If most of the work lives in agent-specific prompts and integrations, the answer is most of it. If most of the work lives in documented, portable skills, the answer is little of it. Third, how often is each skill actually called, and by how many different agents? A skill serving three agents across five workflows is doing compounding work. A skill called once a quarter is overhead. The ratio of reuse to count is the real maturity metric.