← All articles

Meta Ranks Employees by AI Usage. History Says That Backfires.

5 min read
Meta Ranks Employees by AI Usage. History Says That Backfires.

In April 2026, The Information reported that Meta had built an internal leaderboard called “Claudeonomics” ranking roughly 85,000 employees by their consumption of AI tokens. The top users earned the title “Token Legends.” Managers began factoring token consumption into performance evaluations, rewarding heavy AI users and flagging those who fell behind. Meta was not alone. A New York Times investigation by Kevin Roose found the same pattern at OpenAI and Shopify, where AI usage volume had become an explicit performance signal.

The numbers involved are staggering. One OpenAI engineer reportedly consumed 210 billion tokens, a figure Roose compared to processing 33 complete copies of Wikipedia. Meta’s leaderboard logged over 60 trillion tokens in its first month. A Swedish software engineer told Roose that his company now spends more on his Claude Code tokens than on his salary.

A 51-Year-Old Warning

Ethan Mollick, a Wharton professor who studies how organizations adopt AI, read the Meta leaderboard story and pointed to a paper that most management students encounter in their first year but few executives remember when designing incentive systems.

In 1975, Steven Kerr published “On the Folly of Rewarding A, While Hoping for B” in the Academy of Management Journal. His central finding was deceptively simple: organizations consistently build reward systems that incentivize one behavior while expecting a completely different outcome. Kerr traced the pattern across military strategy, university tenure, healthcare, manufacturing, and government, showing that the problem was not unique to any sector but built into how institutions think about measurement.

Kerr identified four root causes. Organizations fixate on objective, easily quantifiable criteria even when the important outcomes are subjective and hard to measure. They overweight visible behaviors because volume is easier to observe than judgment. They engage in institutional hypocrisy, publicly hoping for creativity while privately rewarding compliance. And they emphasize equity over efficiency, distributing rewards uniformly rather than directing them toward the outcomes that matter most.

Token consumption is a textbook case. It is objective, easily quantifiable, and highly visible. It is also completely disconnected from the question every CEO actually cares about, which is whether AI is improving decisions, reducing costs, or creating value that was not there before. The deeper challenge of measuring AI success is that the metrics most organizations already track were designed for a pre-AI operating model. Measuring tokens consumed instead of actual value delivered is the same gap in a different costume.

The same logic operates at the vendor level. When Shopify activated AI storefronts for millions of merchants without asking, it optimized for a metric (AI-mediated traffic) that merchants never chose to optimize for themselves.

The Burnout Trap

The disconnect between consumption and outcomes is not just theoretical. A study published in Harvard Business Review examined what happens when employees are pushed to use more AI tools simultaneously. After three concurrent tools, productivity actually declined. Intent to quit rose 39%, a phenomenon the researchers termed “AI brain fry.” BCG’s research found that tool overload collapses productivity at exactly the point where tokenmaxxing pushes teams to consume more.

But the same study found a clear positive signal in a different direction. When AI was used to replace genuinely repetitive, low-judgment tasks, burnout scores dropped 15%. The employees who benefited most were not the ones consuming the most tokens. They were the ones using AI in the most targeted way, offloading tedious work and redirecting their energy toward problems that required human judgment. There is growing evidence that using AI less, but more deliberately, produces better outcomes than maximizing adoption volume.

The pattern maps directly onto Kerr’s framework. The companies in the HBR study that saw gains were not rewarding AI usage. They were rewarding the removal of friction from specific workflows. The metric was not volume. It was what changed.

What the Right Metrics Look Like

If token consumption is the wrong proxy, the natural question is what should replace it. The AI case study research from INSEAD and Harvard Business School offers a useful frame. In their experiment, the startups that generated 1.9x more revenue were not the ones that used AI the most. They were the ones that discovered where AI fit into their production process and reorganized around it.

The shift is from measuring how much AI your team uses to measuring what your team can do now that it could not do before. Practical alternatives include tracking time saved on specific workflows before and after AI integration, measuring error rates or rework cycles in processes where AI is deployed, and asking whether employees are spending freed-up time on higher-value work or simply adding AI overhead to existing tasks.

The last question matters most, because it separates genuine capability expansion from the kind of performative adoption that Meta’s leaderboard incentivizes. A team that uses AI to cut a two-week approval cycle to three days has measurably changed what the organization can do. A team that consumes more tokens than any other department has proved nothing except that it is responsive to incentives.

Kerr’s Question for Every CEO

Steven Kerr’s paper has endured for 51 years because the pattern it describes is nearly universal, and because recognizing it in theory is much easier than avoiding it in practice. Organizations know they should reward outcomes. They keep rewarding activity instead, because activity is visible and outcomes take time to measure. McKinsey’s projection that AI could reshape half the workforce makes the stakes of getting this right considerably higher.

For any CEO watching the tokenmaxxing trend from the outside, the question is not whether Meta’s approach is wrong. It is whether your own organization’s AI metrics have drifted toward the same logic without anyone naming it. When your team reports AI adoption rates, usage dashboards, or tool rollout statistics, ask what changed because of that usage. If the answer is “we don’t know yet,” you may be rewarding A while hoping for B. A Harvard-led NBER study of seven countries found that the single strongest predictor of AI adoption is whether management actively encourages it and provides tools, not whether it tracks consumption.

Ron Gold Founder, A-Eye Level
Read the original post on LinkedIn Get one email a week