Earlier today I posted Deloitte’s finding: only one in five companies has a mature model for governance of autonomous AI agents (Deloitte AI Institute, 2026 AI Report). The binary “AI on / AI off” rule runs out of room fast. A per-decision threshold on three axes carries the scaffolding for the 80% without it. Bezos and Anthropic do the load-bearing work below.
Dollars at stake: magnitude over category
The first axis is the simplest, which is why most CEOs already use a version of it. A $5K supplier choice tolerates more autonomy than a $5M one, because the cost of a wrong call is bounded. Magnitude dictates the financial override bar. Category dictates the compliance bar separately: a $5K SaaS tool that hooks into customer data, or a vendor with regulated-industry exposure, fires the override regardless of dollar amount. Within any given category, the override bar should then track the dollars. The practical question is whether the loss case is recoverable from working capital, and at what fraction. If the answer is “we’d feel it in the quarter,” the threshold has crossed. The same logic that picked first AI delegations by decision quality runs in reverse here: the override matters most where the operator’s call carries most weight.
Reversibility: the two-way door test
The second axis comes from Bezos’s shareholder letters: Type 1 decisions are one-way doors, Type 2 are two-way. Most AI agent decisions turn out to be Type 2. Anthropic studied millions of agent interactions in 2026 and found only 0.8% of actions are irreversible, such as sending an email to a customer (Anthropic, Measuring AI Agent Autonomy in Practice, February 2026). Reversibility is the cheapest scaffolding the threshold offers, because most decisions don’t need it. The override bar climbs only when the answer to “can this be unwound in a quarter?” is no. The misclassification trap is the expensive one: pilots treated like board commitments, vendor lock-ins like casual trials. The threshold flips that posture: heavy process where the door swings one way, light process where the experiment can be redone Monday.
Audit-trail depth: the regulator test
The third axis catches CEOs off-guard, because the audit trail serves two masters. External defensibility comes from the regulator, board member, or class-action attorney who asks how the decision was made. Internal observability comes from the engineering and operations team fixing a failure: when an agent optimizes a supply-chain route that causes a cascading delay, the team needs the trail to understand why the model made that choice and how to course-correct. Some recurring decisions need only a checkbox: a customer-tier upgrade, a routine refund, a templated outreach. Others need a paper trail a third party can follow next quarter, like pricing changes, hiring screens, and lending judgments. NIST’s AI Risk Management Framework and ISO/IEC 42001 are the operating standards starting to embed this, and the board-level reporting gap most companies haven’t closed is downstream of the same question. The override bar climbs sharply where the audit-trail need is high, and the human signature is non-negotiable not because the agent is wrong but because the trail must be both defensible and debuggable.
Where to start this week
Three concrete moves for the next budget cycle. Map your top ten AI-touched decisions by the axes; anything scoring high on two or more is an override candidate. Pick the first category to formalize where misclassification has cost most. Write the threshold as a one-page operating note, not a policy. The broader governance gap closes the same way: scaffolding first, formal policy when the scaffolding has earned its upgrade.