Risk thresholds that decide when an agent acts alonerisk thresholds
The single most important dial in an autonomous system: a per-action risk score and a cutoff that routes the safe work to the agent and the consequential work to a human. We design it, tune it, and prove it.
- Per-action risk scoring
- Tunable autonomy cutoffs
- Human routing on breach
- Versioned & fully audited
Autonomy is not a switch — it's a curve
Most teams ask "can the agent run on its own?" The real engineering question is finer-grained.
An agent that needs a human for everything is a glorified form. An agent that needs a human for nothing is a liability waiting for its first irreversible mistake. The useful design sits between those poles, and the thing that places you on that spectrum is the risk threshold.
We treat every proposed action as a candidate for autonomy and score it before execution. Reading a record, drafting a reply, and refunding a customer are not the same risk — so they don't share a cutoff. Setting those cutoffs deliberately, per action type, is the core of the work on this page.
The factors behind every threshold
A risk score is a transparent function of measurable inputs — not a vibe the model emits.
Blast radius
How many records, customers, or downstream systems a single action can touch. A query to one row scores far below a batch update.
Reversibility
Can the action be undone cleanly? Sending an email and wiring funds are both writes — one is recoverable, one is not.
Data sensitivity
Whether the action reads or writes PII, financials, or regulated data raises the floor on its score regardless of size.
Model confidence
The model's own calibrated certainty, weighted — never trusted alone. Low confidence on a high-stakes action forces escalation.
Environment
Sandbox, staging, or production. The same action carries a different score depending on what it can actually break.
Monetary value
Dollar amount attached to the action, with hard ceilings above which no autonomous path exists, period.
From conservative defaults to earned autonomy
Thresholds aren't guessed once — they're moved by evidence.
Start strict
Ship with low cutoffs so nearly everything consequential routes to a human. Day-one autonomy is the read-only safe set.
Instrument
Log every score, every approval, every override and the reason. The approve/deny pattern per action type is the dataset.
Loosen by evidence
Where humans approve an action type near-unanimously, raise its threshold. Where overrides cluster, tighten or split the rule.
Watch the two curves
Track approval volume falling against incident rate staying flat. Divergence means a threshold moved too far — and we walk it back.
A breach pauses a step, not the system
When a proposed action scores above its threshold, the agent doesn't fail and it doesn't barrel ahead. The workflow pauses precisely at that step. The proposed action, its computed score, and the factors that drove it are packaged and routed to the right approver with the context they need to decide in seconds.
Approve, and the run resumes from where it stopped. Deny, and the agent unwinds the step cleanly and either proposes an alternative or escalates the exception. Either way the decision, the human who made it, and the full input are written to an immutable lineage record.
- Step-level pause preserves safe upstream work
- Approver sees the score and its drivers, not a raw prompt
- Every breach and resolution is captured in lineage
A blanket setting vs. a real threshold model
Why a single "autonomy level" slider falls apart in production.
| A global autonomy setting | An Automatic.co threshold model | |
|---|---|---|
| Granularity | One dial for the whole agent | A cutoff per action type |
| Inputs | Model confidence, if anything | Blast radius, reversibility, value, data, env |
| Tuning | Guessed and left alone | Moved by approve/override evidence |
| On breach | Hard stop or silent override | Step pause, routed to the right human |
| Explainability | "The model wasn't sure" | Itemized, versioned score on record |
Frequently asked questions
What exactly is a risk threshold?
A numeric cutoff on a per-action risk score. When the agent's computed score for a proposed action falls below the threshold it executes autonomously; above it, the action is routed to an approval gate or refused. Thresholds are set per action type, not globally.
How do you compute the risk score for an action?
We blend deterministic factors — blast radius, reversibility, monetary value, data sensitivity, and whether the target is production — with the model's own calibrated confidence. The formula is explicit and versioned, so a score is always explainable, never a black box.
Won't conservative thresholds just bury humans in approvals?
That's the failure mode we tune against. We start strict, measure the approve/override rate per action type, then loosen thresholds only where humans approve near-unanimously. The goal is a falling approval volume with a flat-to-zero incident rate.
What happens when an agent crosses a threshold mid-workflow?
The workflow pauses at that step, not the whole run. The proposed action, its score, and the contributing factors are sent to the right approver; safe upstream work is preserved, and the run resumes on approval or unwinds cleanly on denial.
Related architecture decisions
Risk thresholds are one knob in a governed agent design. These connect to it directly.
Set the dial deliberately, not by default
Bring one workflow you're nervous about automating. We'll map its actions, score them, and propose the thresholds that make autonomy safe.