Deploying agents to production, without surprises
An agent release is more than a container — it's code, model, prompts, tool permissions, and retrieval data moving together. We design the deployment so what you approved in staging is exactly what acts in production.
- Immutable, versioned release artifacts
- Canary rollouts with auto-abort
- Single-pointer instant rollback
- Runs inside your perimeter
What you're actually deploying
The engineering hard part isn't the rollout mechanics — it's defining the unit of deployment.
With a stateless API, the deploy artifact is obvious: an image and its config. An agent is messier. Its behavior is decided by five moving parts — the orchestration code, the model and its version, the system and tool prompts, the set of tools it's permitted to call, and the retrieval corpus it reads. Change any one and the output changes, often invisibly.
So we treat all five as a single, content-addressed release artifact. A version pin isn't "deploy v2.3 of the service" — it's "v2.3 = this code SHA, claude-sonnet at this snapshot, these prompt files, this tool manifest, this index build." That artifact is what we test, what we canary, and what we roll back to. No drift between environments, because there's only one thing to promote.
What ships with every release
Deployment is where governance becomes concrete. These controls are baked into the artifact, not bolted on after.
Environment promotion
Dev → staging → prod with the same artifact at each gate. Promotion is approval-gated, never a hand edit in production.
Versioned agents
Every release is an immutable, content-addressed bundle of code, model, prompts, tools, and data — pinned and reproducible.
Lineage on by default
Each prod run records which version produced it and which inputs it saw, so any output traces back to an exact artifact.
Cost circuit breakers
Per-deploy budgets and rate ceilings that trip before a runaway loop turns into a five-figure invoice.
Model governance gates
A model swap can't reach prod without passing the same eval suite and approval the rest of the artifact does.
Failover topology
Deploy across providers and regions so a model outage degrades gracefully instead of stopping the workflow.
How a release reaches production
A measured path that earns trust with real traffic before it carries the full load.
Build & pin
Assemble the artifact, snapshot the model and index, and run the eval suite. Anything below threshold never leaves the build.
Stage
Replay real historical tasks against the candidate in an isolated environment and diff its decisions against the incumbent.
Canary
Route 1–5% of live traffic to the new version. Watch success rate, cost-per-task, and intervention rate side by side.
Promote or abort
If the canary wins, ramp to 100%. If a metric regresses, auto-abort and route back to the pinned last-good version.
Rollback is a pointer flip, not a fire drill
Most agent incidents are recoverable in seconds if you've architected for it. We keep the previous known-good artifact warm and running; promotion only changes which version a router points at. Reverting means flipping that pointer back — no rebuild, no redeploy, no scramble to find the old prompt file.
Pair that with circuit breakers — on cost, error rate, and human-intervention rate — and a bad release contains itself. The agent trips a breaker, traffic drains to the last good version, and your team gets a page with the exact artifact diff that caused it, not a vague "the AI is acting weird."
- Last-good version stays warm and pinned
- Breakers trip on cost, errors, and interventions
- Alerts carry the exact artifact diff
Web-service deploy vs. agent deploy
Why the familiar CI/CD playbook needs a few new moves for autonomous work.
| Standard service deploy | Automatic.co agent deploy | |
|---|---|---|
| Unit of deploy | Image + config | Code, model, prompts, tools, and data as one artifact |
| Success signal | Health check / HTTP 200 | Task success, cost-per-task, intervention rate |
| Rollout | Blue-green by request | Canary by task outcome, auto-aborting |
| Rollback | Redeploy previous image | Pointer flip to a warm, pinned version |
| Blast radius control | Rate limits | Cost + error + intervention circuit breakers |
Frequently asked questions
Where do the agents actually run?
Inside your perimeter — your VPC, on-prem hardware, or an air-gapped network. We package the agent runtime as containers you schedule on your own Kubernetes or VM fleet, so model calls, tool calls, and logs never leave your boundary unless you decide otherwise.
How do you roll out a new agent version without breaking production?
Every change ships behind a canary. We route a small slice of real traffic to the new version, watch task success rate, cost-per-task, and intervention rate against the incumbent, and promote only when the new version wins. A failed canary auto-aborts and routes back to the last good version.
What happens when a deployment goes wrong at 2 a.m.?
Rollback is a single immutable pointer flip, not a redeploy — the previous version's image, prompts, and tool config are already running and pinned. Combined with circuit breakers on cost and error rate, a bad release is contained in seconds, not after a long incident.
How is this different from deploying a normal web service?
An agent's behavior is a function of code, model, prompts, tool permissions, and retrieval data — and any one of those can change the output. We version and promote all five together as one release artifact, so what you tested in staging is bit-for-bit what runs in prod.
Related architecture decisions
Deployment is one node in a larger agent-design graph.
Design a deployment you can sleep through.
Bring your environment and risk constraints; leave with a release pipeline, canary plan, and rollback strategy for your agents.