Agent architecture

Deploying agents to production, without surprises

Q: Where do the agents actually run?

Inside your perimeter — your VPC, on-prem hardware, or an air-gapped network. We package the agent runtime as containers you schedule on your own Kubernetes or VM fleet, so model calls, tool calls, and logs never leave your boundary unless you decide otherwise.

Q: How do you roll out a new agent version without breaking production?

Every change ships behind a canary. We route a small slice of real traffic to the new version, watch task success rate, cost-per-task, and intervention rate against the incumbent, and promote only when the new version wins. A failed canary auto-aborts and routes back to the last good version.

Q: What happens when a deployment goes wrong at 2 a.m.?

Rollback is a single immutable pointer flip, not a redeploy — the previous version's image, prompts, and tool config are already running and pinned. Combined with circuit breakers on cost and error rate, a bad release is contained in seconds, not after a long incident.

Q: How is this different from deploying a normal web service?

An agent's behavior is a function of code, model, prompts, tool permissions, and retrieval data — and any one of those can change the output. We version and promote all five together as one release artifact, so what you tested in staging is bit-for-bit what runs in prod.

An agent release is more than a container — it's code, model, prompts, tool permissions, and retrieval data moving together. We design the deployment so what you approved in staging is exactly what acts in production.

Immutable, versioned release artifacts
Canary rollouts with auto-abort
Single-pointer instant rollback
Runs inside your perimeter

Book a Call Get Started

things versioned per release: code, model, prompts, tools, data

<60s

to roll back to the last known-good version

1–5%

starting canary slice before full promotion

model or tool calls that leave your perimeter by default

// the core decision

What you're actually deploying

The engineering hard part isn't the rollout mechanics — it's defining the unit of deployment.

With a stateless API, the deploy artifact is obvious: an image and its config. An agent is messier. Its behavior is decided by five moving parts — the orchestration code, the model and its version, the system and tool prompts, the set of tools it's permitted to call, and the retrieval corpus it reads. Change any one and the output changes, often invisibly.

So we treat all five as a single, content-addressed release artifact. A version pin isn't "deploy v2.3 of the service" — it's "v2.3 = this code SHA, claude-sonnet at this snapshot, these prompt files, this tool manifest, this index build." That artifact is what we test, what we canary, and what we roll back to. No drift between environments, because there's only one thing to promote.

// deploy-time controls

What ships with every release

Deployment is where governance becomes concrete. These controls are baked into the artifact, not bolted on after.

Environment promotion

Dev → staging → prod with the same artifact at each gate. Promotion is approval-gated, never a hand edit in production.

Versioned agents

Every release is an immutable, content-addressed bundle of code, model, prompts, tools, and data — pinned and reproducible.

Lineage on by default

Each prod run records which version produced it and which inputs it saw, so any output traces back to an exact artifact.

Cost circuit breakers

Per-deploy budgets and rate ceilings that trip before a runaway loop turns into a five-figure invoice.

Model governance gates

A model swap can't reach prod without passing the same eval suite and approval the rest of the artifact does.

Failover topology

Deploy across providers and regions so a model outage degrades gracefully instead of stopping the workflow.

// the rollout

How a release reaches production

A measured path that earns trust with real traffic before it carries the full load.

Build & pin

Assemble the artifact, snapshot the model and index, and run the eval suite. Anything below threshold never leaves the build.

Stage

Replay real historical tasks against the candidate in an isolated environment and diff its decisions against the incumbent.

Canary

Route 1–5% of live traffic to the new version. Watch success rate, cost-per-task, and intervention rate side by side.

Promote or abort

If the canary wins, ramp to 100%. If a metric regresses, auto-abort and route back to the pinned last-good version.

// rollback strategy

Rollback is a pointer flip, not a fire drill

Most agent incidents are recoverable in seconds if you've architected for it. We keep the previous known-good artifact warm and running; promotion only changes which version a router points at. Reverting means flipping that pointer back — no rebuild, no redeploy, no scramble to find the old prompt file.

Pair that with circuit breakers — on cost, error rate, and human-intervention rate — and a bad release contains itself. The agent trips a breaker, traffic drains to the last good version, and your team gets a page with the exact artifact diff that caused it, not a vague "the AI is acting weird."

Last-good version stays warm and pinned
Breakers trip on cost, errors, and interventions
Alerts carry the exact artifact diff

See exception handling

Web-service deploy vs. agent deploy

Why the familiar CI/CD playbook needs a few new moves for autonomous work.

	Standard service deploy	Automatic.co agent deploy
Unit of deploy	Image + config	Code, model, prompts, tools, and data as one artifact
Success signal	Health check / HTTP 200	Task success, cost-per-task, intervention rate
Rollout	Blue-green by request	Canary by task outcome, auto-aborting
Rollback	Redeploy previous image	Pointer flip to a warm, pinned version
Blast radius control	Rate limits	Cost + error + intervention circuit breakers

Frequently asked questions

Where do the agents actually run?

Inside your perimeter — your VPC, on-prem hardware, or an air-gapped network. We package the agent runtime as containers you schedule on your own Kubernetes or VM fleet, so model calls, tool calls, and logs never leave your boundary unless you decide otherwise.

How do you roll out a new agent version without breaking production?

Every change ships behind a canary. We route a small slice of real traffic to the new version, watch task success rate, cost-per-task, and intervention rate against the incumbent, and promote only when the new version wins. A failed canary auto-aborts and routes back to the last good version.

What happens when a deployment goes wrong at 2 a.m.?

Rollback is a single immutable pointer flip, not a redeploy — the previous version's image, prompts, and tool config are already running and pinned. Combined with circuit breakers on cost and error rate, a bad release is contained in seconds, not after a long incident.

How is this different from deploying a normal web service?

An agent's behavior is a function of code, model, prompts, tool permissions, and retrieval data — and any one of those can change the output. We version and promote all five together as one release artifact, so what you tested in staging is bit-for-bit what runs in prod.

Related architecture decisions

Deployment is one node in a larger agent-design graph.

Single vs. multi-agent Stateful vs. stateless Action execution Risk thresholds Agent versioning Cost controls

Design a deployment you can sleep through.

Bring your environment and risk constraints; leave with a release pipeline, canary plan, and rollback strategy for your agents.

Book a Call Get Started