Agent Governance

Risk thresholds that decide when an agent acts alonerisk thresholds

The single most important dial in an autonomous system: a per-action risk score and a cutoff that routes the safe work to the agent and the consequential work to a human. We design it, tune it, and prove it.

  • Per-action risk scoring
  • Tunable autonomy cutoffs
  • Human routing on breach
  • Versioned & fully audited
1 score
per proposed action, before it runs
<1s
to score and route a typical action
5+
risk factors blended into each score
0
high-blast-radius actions without a gate
// the decision

Autonomy is not a switch — it's a curve

Most teams ask "can the agent run on its own?" The real engineering question is finer-grained.

An agent that needs a human for everything is a glorified form. An agent that needs a human for nothing is a liability waiting for its first irreversible mistake. The useful design sits between those poles, and the thing that places you on that spectrum is the risk threshold.

We treat every proposed action as a candidate for autonomy and score it before execution. Reading a record, drafting a reply, and refunding a customer are not the same risk — so they don't share a cutoff. Setting those cutoffs deliberately, per action type, is the core of the work on this page.

// what goes into a score

The factors behind every threshold

A risk score is a transparent function of measurable inputs — not a vibe the model emits.

// how we tune them

From conservative defaults to earned autonomy

Thresholds aren't guessed once — they're moved by evidence.

01

Start strict

Ship with low cutoffs so nearly everything consequential routes to a human. Day-one autonomy is the read-only safe set.

02

Instrument

Log every score, every approval, every override and the reason. The approve/deny pattern per action type is the dataset.

03

Loosen by evidence

Where humans approve an action type near-unanimously, raise its threshold. Where overrides cluster, tighten or split the rule.

04

Watch the two curves

Track approval volume falling against incident rate staying flat. Divergence means a threshold moved too far — and we walk it back.

// when a threshold is crossed

A breach pauses a step, not the system

When a proposed action scores above its threshold, the agent doesn't fail and it doesn't barrel ahead. The workflow pauses precisely at that step. The proposed action, its computed score, and the factors that drove it are packaged and routed to the right approver with the context they need to decide in seconds.

Approve, and the run resumes from where it stopped. Deny, and the agent unwinds the step cleanly and either proposes an alternative or escalates the exception. Either way the decision, the human who made it, and the full input are written to an immutable lineage record.

  • Step-level pause preserves safe upstream work
  • Approver sees the score and its drivers, not a raw prompt
  • Every breach and resolution is captured in lineage

A blanket setting vs. a real threshold model

Why a single "autonomy level" slider falls apart in production.

A global autonomy settingAn Automatic.co threshold model
GranularityOne dial for the whole agentA cutoff per action type
InputsModel confidence, if anythingBlast radius, reversibility, value, data, env
TuningGuessed and left aloneMoved by approve/override evidence
On breachHard stop or silent overrideStep pause, routed to the right human
Explainability"The model wasn't sure"Itemized, versioned score on record

Frequently asked questions

What exactly is a risk threshold?

A numeric cutoff on a per-action risk score. When the agent's computed score for a proposed action falls below the threshold it executes autonomously; above it, the action is routed to an approval gate or refused. Thresholds are set per action type, not globally.

How do you compute the risk score for an action?

We blend deterministic factors — blast radius, reversibility, monetary value, data sensitivity, and whether the target is production — with the model's own calibrated confidence. The formula is explicit and versioned, so a score is always explainable, never a black box.

Won't conservative thresholds just bury humans in approvals?

That's the failure mode we tune against. We start strict, measure the approve/override rate per action type, then loosen thresholds only where humans approve near-unanimously. The goal is a falling approval volume with a flat-to-zero incident rate.

What happens when an agent crosses a threshold mid-workflow?

The workflow pauses at that step, not the whole run. The proposed action, its score, and the contributing factors are sent to the right approver; safe upstream work is preserved, and the run resumes on approval or unwinds cleanly on denial.

Related architecture decisions

Risk thresholds are one knob in a governed agent design. These connect to it directly.

Set the dial deliberately, not by default

Bring one workflow you're nervous about automating. We'll map its actions, score them, and propose the thresholds that make autonomy safe.