Agent Governance

Risk thresholds that decide when an agent acts alonerisk thresholds

Q: What exactly is a risk threshold?

A numeric cutoff on a per-action risk score. When the agent's computed score for a proposed action falls below the threshold it executes autonomously; above it, the action is routed to an approval gate or refused. Thresholds are set per action type, not globally.

Q: How do you compute the risk score for an action?

We blend deterministic factors — blast radius, reversibility, monetary value, data sensitivity, and whether the target is production — with the model's own calibrated confidence. The formula is explicit and versioned, so a score is always explainable, never a black box.

Q: Won't conservative thresholds just bury humans in approvals?

That's the failure mode we tune against. We start strict, measure the approve/override rate per action type, then loosen thresholds only where humans approve near-unanimously. The goal is a falling approval volume with a flat-to-zero incident rate.

Q: What happens when an agent crosses a threshold mid-workflow?

The workflow pauses at that step, not the whole run. The proposed action, its score, and the contributing factors are sent to the right approver; safe upstream work is preserved, and the run resumes on approval or unwinds cleanly on denial.

The single most important dial in an autonomous system: a per-action risk score and a cutoff that routes the safe work to the agent and the consequential work to a human. We design it, tune it, and prove it.

Per-action risk scoring
Tunable autonomy cutoffs
Human routing on breach
Versioned & fully audited

Book a Call Get Started

1 score

per proposed action, before it runs

<1s

to score and route a typical action

risk factors blended into each score

high-blast-radius actions without a gate

// the decision

Autonomy is not a switch — it's a curve

Most teams ask "can the agent run on its own?" The real engineering question is finer-grained.

An agent that needs a human for everything is a glorified form. An agent that needs a human for nothing is a liability waiting for its first irreversible mistake. The useful design sits between those poles, and the thing that places you on that spectrum is the risk threshold.

We treat every proposed action as a candidate for autonomy and score it before execution. Reading a record, drafting a reply, and refunding a customer are not the same risk — so they don't share a cutoff. Setting those cutoffs deliberately, per action type, is the core of the work on this page.

// what goes into a score

The factors behind every threshold

A risk score is a transparent function of measurable inputs — not a vibe the model emits.

Blast radius

How many records, customers, or downstream systems a single action can touch. A query to one row scores far below a batch update.

Reversibility

Can the action be undone cleanly? Sending an email and wiring funds are both writes — one is recoverable, one is not.

Data sensitivity

Whether the action reads or writes PII, financials, or regulated data raises the floor on its score regardless of size.

Model confidence

The model's own calibrated certainty, weighted — never trusted alone. Low confidence on a high-stakes action forces escalation.

Environment

Sandbox, staging, or production. The same action carries a different score depending on what it can actually break.

Monetary value

Dollar amount attached to the action, with hard ceilings above which no autonomous path exists, period.

// how we tune them

From conservative defaults to earned autonomy

Thresholds aren't guessed once — they're moved by evidence.

Start strict

Ship with low cutoffs so nearly everything consequential routes to a human. Day-one autonomy is the read-only safe set.

Instrument

Log every score, every approval, every override and the reason. The approve/deny pattern per action type is the dataset.

Loosen by evidence

Where humans approve an action type near-unanimously, raise its threshold. Where overrides cluster, tighten or split the rule.

Watch the two curves

Track approval volume falling against incident rate staying flat. Divergence means a threshold moved too far — and we walk it back.

// when a threshold is crossed

A breach pauses a step, not the system

When a proposed action scores above its threshold, the agent doesn't fail and it doesn't barrel ahead. The workflow pauses precisely at that step. The proposed action, its computed score, and the factors that drove it are packaged and routed to the right approver with the context they need to decide in seconds.

Approve, and the run resumes from where it stopped. Deny, and the agent unwinds the step cleanly and either proposes an alternative or escalates the exception. Either way the decision, the human who made it, and the full input are written to an immutable lineage record.

Step-level pause preserves safe upstream work
Approver sees the score and its drivers, not a raw prompt
Every breach and resolution is captured in lineage

How approval gates work

A blanket setting vs. a real threshold model

Why a single "autonomy level" slider falls apart in production.

	A global autonomy setting	An Automatic.co threshold model
Granularity	One dial for the whole agent	A cutoff per action type
Inputs	Model confidence, if anything	Blast radius, reversibility, value, data, env
Tuning	Guessed and left alone	Moved by approve/override evidence
On breach	Hard stop or silent override	Step pause, routed to the right human
Explainability	"The model wasn't sure"	Itemized, versioned score on record

Frequently asked questions

What exactly is a risk threshold?

A numeric cutoff on a per-action risk score. When the agent's computed score for a proposed action falls below the threshold it executes autonomously; above it, the action is routed to an approval gate or refused. Thresholds are set per action type, not globally.

How do you compute the risk score for an action?

We blend deterministic factors — blast radius, reversibility, monetary value, data sensitivity, and whether the target is production — with the model's own calibrated confidence. The formula is explicit and versioned, so a score is always explainable, never a black box.

Won't conservative thresholds just bury humans in approvals?

That's the failure mode we tune against. We start strict, measure the approve/override rate per action type, then loosen thresholds only where humans approve near-unanimously. The goal is a falling approval volume with a flat-to-zero incident rate.

What happens when an agent crosses a threshold mid-workflow?

The workflow pauses at that step, not the whole run. The proposed action, its score, and the contributing factors are sent to the right approver; safe upstream work is preserved, and the run resumes on approval or unwinds cleanly on denial.

Related architecture decisions

Risk thresholds are one knob in a governed agent design. These connect to it directly.

Approval Gates Action Execution Exception Handling Decision Lineage Model Governance Cost Controls All Services

Set the dial deliberately, not by default

Bring one workflow you're nervous about automating. We'll map its actions, score them, and propose the thresholds that make autonomy safe.

Book a Call Get Started