Agent Architecture

Version every agent like the production system it is

Q: What exactly gets versioned in an agent version?

Everything that can change behavior: system and tool prompts, the tool/function schemas, the model ID and its decoding params, retrieval indexes, guardrail policy, and the orchestration graph. We hash them together into one immutable artifact with a semantic version, so a version is a single thing you can pin, diff, and roll back.

Q: Why not just rely on the model provider's version pin?

Pinning the model (for example a dated Claude or GPT snapshot) only fixes one input. Your prompt, tools, RAG corpus, and policy drift independently and change behavior just as much. Agent versioning captures all of them together, so a run is reproducible even when one underlying piece is updated.

Q: How do you roll back a bad agent version in production?

Because each version is immutable and addressable, rollback is a routing change, not a redeploy. We flip the traffic alias back to the last-good version, in-flight runs finish on the version they started, and the decision lineage records exactly which version produced which outcome.

Q: Can different versions run at the same time?

Yes. We run canary and shadow traffic so a new version takes a small percentage of live load (or runs read-only alongside the current one) before promotion. Evaluations and cost run against the candidate version before it ever owns the full alias.

Prompts, tools, models, and policy all change behavior — and they drift independently. We package them into one immutable, semantically versioned artifact so every run is reproducible and any change is one promotion or rollback away.

Immutable, content-hashed versions
Semantic versioning for behavior changes
Canary, shadow, and instant rollback
Lineage that ties outcomes to versions

Book a Call Get Started

1 artifact

prompt + tools + model + policy, hashed together

< 1 min

to roll a version back via traffic alias

100%

of runs traceable to an exact version

redeploys needed to flip live traffic

// the problem

An agent has no single source of truth — unless you build one

The hard part of versioning agents is that behavior lives in four places at once.

A model deploy versions a container. An agent's behavior is the joint product of its system prompt, its tool schemas, the model and decoding parameters, the retrieval corpus it reads, and the guardrail policy that constrains it. Change any one and the agent acts differently — but each of those is usually owned by a different system, edited by a different person, and shipped on a different cadence.

That is why "it worked yesterday" is so common with agents. Someone tweaked a prompt, the RAG index reindexed overnight, or the provider rotated a model snapshot under a floating alias. Without a version that binds all of it, you cannot reproduce a run, you cannot attribute a regression, and you cannot safely roll back.

Agent versioning fixes the seam: capture every behavior-determining input as one immutable artifact, give it a real version number, and make that version the unit you deploy, evaluate, and revert.

// what's in a version

What an agent version pins

A version is only useful if it captures everything that can change the answer. These are the inputs we freeze together.

Prompts & instructions

System prompt, tool descriptions, and few-shot examples — the text most likely to be edited casually and the most likely to shift behavior.

Tool & action schemas

The function signatures and the action layer the agent can call. Adding, renaming, or re-scoping a tool is a behavior change.

Model & decoding params

The exact model ID (a dated snapshot, never a floating alias), plus temperature, max tokens, and stop conditions.

Retrieval corpus & index

The embedding model and a snapshot of the indexed knowledge, so a reindex doesn't silently rewrite what the agent knows.

Guardrail policy

Risk thresholds, approval gates, and redaction rules bound into the version — policy is behavior, so it ships with the agent.

Orchestration graph

The topology of sub-agents, routing, and state handling that decides which steps run and in what order.

// the lifecycle

From change to production, safely

Every promotion follows the same gated path, so shipping an agent change feels like shipping software.

Build

Assemble the artifact, content-hash it, and assign a semantic version. Identical inputs always produce the same version.

Evaluate

Run the candidate against a regression suite and golden transcripts. Behavior diffs and cost deltas surface before any traffic.

Canary

Route a small slice of live traffic (or shadow it read-only) to the new version and watch quality, latency, and spend in parallel.

Promote

Move the production alias to the new version. If anything regresses, flip the alias back — no redeploy, no in-flight breakage.

// semantic versioning for agents

Major, minor, patch — for behavior

We treat agent changes the way good libraries treat their APIs. A patch is a non-behavioral fix — a typo in a prompt, a logging tweak. A minor bump adds a capability without breaking existing flows — a new tool, an expanded corpus. A major bump changes behavior in ways downstream consumers must review — a model family swap, a reworked policy, a restructured orchestration graph.

Encoding intent in the version number means consumers know whether to re-run their evals before adopting it, and on-call engineers know at a glance whether last night's bump could plausibly explain a regression. The version stops being an opaque build hash and becomes a contract.

Major = behavior changes that need re-review
Minor = additive, backward-compatible capability
Patch = no behavior change

Model governance

Unversioned agents vs. versioned agents

The difference between hoping a change is safe and knowing it.

	Unversioned agent	Versioned agent
Reproduce a run	Best-effort guesswork	Replay the exact pinned artifact
Roll back	Re-edit prompts, redeploy, hope	Flip the traffic alias in seconds
Attribute a regression	"Something changed somewhere"	Lineage names the exact version
Model snapshot	Floating alias, drifts silently	Dated snapshot frozen in the version
Releasing a change	All-at-once, fingers crossed	Eval → canary → promote, gated

Frequently asked questions

What exactly gets versioned in an agent version?

Everything that can change behavior: system and tool prompts, the tool/function schemas, the model ID and its decoding params, retrieval indexes, guardrail policy, and the orchestration graph. We hash them together into one immutable artifact with a semantic version, so a version is a single thing you can pin, diff, and roll back.

Why not just rely on the model provider's version pin?

Pinning the model (for example a dated Claude or GPT snapshot) only fixes one input. Your prompt, tools, RAG corpus, and policy drift independently and change behavior just as much. Agent versioning captures all of them together, so a run is reproducible even when one underlying piece is updated.

How do you roll back a bad agent version in production?

Because each version is immutable and addressable, rollback is a routing change, not a redeploy. We flip the traffic alias back to the last-good version, in-flight runs finish on the version they started, and the decision lineage records exactly which version produced which outcome.

Can different versions run at the same time?

Yes. We run canary and shadow traffic so a new version takes a small percentage of live load (or runs read-only alongside the current one) before promotion. Evaluations and cost run against the candidate version before it ever owns the full alias.

Related architecture decisions

Versioning is one node in the governance graph. These pair with it.

Single vs. multi-agent Stateful vs. stateless Decision lineage Deployment Model governance Cost controls AI redundancy

Make every agent change reversible

Bring the agent you're afraid to touch. We'll leave you with a versioning scheme, a promotion path, and a rollback you can trust.

Book a Call Get Started