Agent architecture

Deploying agents to production, without surprises

An agent release is more than a container — it's code, model, prompts, tool permissions, and retrieval data moving together. We design the deployment so what you approved in staging is exactly what acts in production.

  • Immutable, versioned release artifacts
  • Canary rollouts with auto-abort
  • Single-pointer instant rollback
  • Runs inside your perimeter
5
things versioned per release: code, model, prompts, tools, data
<60s
to roll back to the last known-good version
1–5%
starting canary slice before full promotion
0
model or tool calls that leave your perimeter by default
// the core decision

What you're actually deploying

The engineering hard part isn't the rollout mechanics — it's defining the unit of deployment.

With a stateless API, the deploy artifact is obvious: an image and its config. An agent is messier. Its behavior is decided by five moving parts — the orchestration code, the model and its version, the system and tool prompts, the set of tools it's permitted to call, and the retrieval corpus it reads. Change any one and the output changes, often invisibly.

So we treat all five as a single, content-addressed release artifact. A version pin isn't "deploy v2.3 of the service" — it's "v2.3 = this code SHA, claude-sonnet at this snapshot, these prompt files, this tool manifest, this index build." That artifact is what we test, what we canary, and what we roll back to. No drift between environments, because there's only one thing to promote.

// deploy-time controls

What ships with every release

Deployment is where governance becomes concrete. These controls are baked into the artifact, not bolted on after.

// the rollout

How a release reaches production

A measured path that earns trust with real traffic before it carries the full load.

01

Build & pin

Assemble the artifact, snapshot the model and index, and run the eval suite. Anything below threshold never leaves the build.

02

Stage

Replay real historical tasks against the candidate in an isolated environment and diff its decisions against the incumbent.

03

Canary

Route 1–5% of live traffic to the new version. Watch success rate, cost-per-task, and intervention rate side by side.

04

Promote or abort

If the canary wins, ramp to 100%. If a metric regresses, auto-abort and route back to the pinned last-good version.

// rollback strategy

Rollback is a pointer flip, not a fire drill

Most agent incidents are recoverable in seconds if you've architected for it. We keep the previous known-good artifact warm and running; promotion only changes which version a router points at. Reverting means flipping that pointer back — no rebuild, no redeploy, no scramble to find the old prompt file.

Pair that with circuit breakers — on cost, error rate, and human-intervention rate — and a bad release contains itself. The agent trips a breaker, traffic drains to the last good version, and your team gets a page with the exact artifact diff that caused it, not a vague "the AI is acting weird."

  • Last-good version stays warm and pinned
  • Breakers trip on cost, errors, and interventions
  • Alerts carry the exact artifact diff

Web-service deploy vs. agent deploy

Why the familiar CI/CD playbook needs a few new moves for autonomous work.

Standard service deployAutomatic.co agent deploy
Unit of deployImage + configCode, model, prompts, tools, and data as one artifact
Success signalHealth check / HTTP 200Task success, cost-per-task, intervention rate
RolloutBlue-green by requestCanary by task outcome, auto-aborting
RollbackRedeploy previous imagePointer flip to a warm, pinned version
Blast radius controlRate limitsCost + error + intervention circuit breakers

Frequently asked questions

Where do the agents actually run?

Inside your perimeter — your VPC, on-prem hardware, or an air-gapped network. We package the agent runtime as containers you schedule on your own Kubernetes or VM fleet, so model calls, tool calls, and logs never leave your boundary unless you decide otherwise.

How do you roll out a new agent version without breaking production?

Every change ships behind a canary. We route a small slice of real traffic to the new version, watch task success rate, cost-per-task, and intervention rate against the incumbent, and promote only when the new version wins. A failed canary auto-aborts and routes back to the last good version.

What happens when a deployment goes wrong at 2 a.m.?

Rollback is a single immutable pointer flip, not a redeploy — the previous version's image, prompts, and tool config are already running and pinned. Combined with circuit breakers on cost and error rate, a bad release is contained in seconds, not after a long incident.

How is this different from deploying a normal web service?

An agent's behavior is a function of code, model, prompts, tool permissions, and retrieval data — and any one of those can change the output. We version and promote all five together as one release artifact, so what you tested in staging is bit-for-bit what runs in prod.

Related architecture decisions

Deployment is one node in a larger agent-design graph.

Design a deployment you can sleep through.

Bring your environment and risk constraints; leave with a release pipeline, canary plan, and rollback strategy for your agents.