Agent Architecture

Model governance for agents that run in production

Which model runs which step, which version is pinned, what passes before rollout — these are policy decisions, not strings buried in agent code. We make model choice a governed, observable, reversible layer.

  • Per-step model routing
  • Version pinning & promotion
  • Pre-rollout evaluation
  • Provider fallback & redundancy
40–70%
typical spend cut from routing steps to the right-sized model
1 layer
model choice lives in policy, not agent code
100%
of steps log the exact model + version they ran
0
silent auto-upgrades to a new model version
// the engineering decision

Model choice is an architectural layer, not a constant

The cheapest way to ship an agent is to hard-code one model. It is also the fastest way to a brittle, expensive system.

When a model name is a literal inside agent logic, every concern collapses onto one string: cost, quality, latency, provider availability, and deprecation risk all become invisible until they break. You cannot A/B a swap, you cannot roll back cleanly, and you cannot answer the audit question — which model produced this decision, and was it the version we validated?

Model governance separates the choice of model from the work the agent does. A routing policy maps each step to a model class, version pins lock what actually ships, and an evaluation gate decides what gets promoted. The agent calls a step; the policy decides the model. That indirection is what makes a fleet safe to operate and cheap to evolve.

// what the layer covers

The pieces of a governance layer

Each is a deliberate engineering choice with its own tradeoffs — and each links to the part of the architecture it touches.

// how a model change ships

Promoting a model, safely

The path a new or swapped model takes before it touches a single live decision.

01

Pin

Register the candidate model + version in the policy as a shadow, with no production traffic routed to it yet.

02

Evaluate

Replay it against the held-out evaluation set and recent real traffic, scoring quality, cost, and latency vs. the incumbent.

03

Shadow

Run it in parallel on a slice of live steps, comparing outputs without acting on them, to catch tail-case regressions.

04

Promote

Flip the routing policy to the new pin. If anything regresses, roll back by reverting one config value — no code deploy.

// the tradeoff

Right-sized, not biggest-available

The instinct is to point every step at the strongest frontier model and trust it. For the handful of genuinely hard steps — planning, adjudication, ambiguous extraction — that's correct. For the high-volume majority — classification, routing, structured extraction, formatting — it's wasteful and often slower than it needs to be.

We benchmark each step against your own cases, then route the bulk of traffic to small fast models and reserve frontier capacity for the steps that earn it. The result is a fleet that's cheaper, lower-latency, and — because the policy is explicit — auditable. Governance is part of the architecture, not bolted on after the bill arrives.

  • Step-level benchmarks on your real cases
  • Frontier models reserved for hard steps
  • Explicit, reviewable routing policy

Hard-coded model vs. governed model layer

The difference between an agent that demos well and a fleet you can operate.

Hard-coded model stringGoverned model layer
Model choiceOne literal in agent codeRouting policy, per-step
New versionSilent or manual swapPinned, evaluated, then promoted
RollbackCode change + redeployRevert one config value
Provider outageWorkflow stallsFails over to equivalent model
AuditUnknown which model ranExact model + version logged
CostOne model for every stepRight-sized per step

Frequently asked questions

Why not just hard-code one model and move on?

Because a single string in your agent code becomes a single point of failure and a silent cost center. A pinned model gets deprecated, a cheaper one ships, or a step starts failing — and with no governance layer you find out in production. A policy layer lets you swap, route, and roll back without touching agent logic.

How do you decide which model handles which step?

We profile each step by what it actually needs — reasoning depth, context length, tool-calling reliability, latency budget, and cost ceiling — then route accordingly. A cheap fast model classifies and extracts; a frontier model plans and adjudicates. Routing is a policy, evaluated against held-out cases, not a guess.

What happens when a provider ships a new model version?

Nothing automatic. New versions are pinned and shadow-tested against your evaluation set first. We compare quality, cost, and latency on your real traffic before promotion, and every step records the exact model and version it ran so a regression is traceable to a single change.

Does governance slow the agents down or add cost?

Net, it usually lowers cost. Routing the bulk of cheap, high-volume steps to small models and reserving frontier models for the few hard ones often cuts spend 40 to 70 percent. The governance layer itself is config and logging — negligible latency, and it pays for itself the first time a swap saves a rollback.

Related architecture decisions

Model governance sits alongside the rest of the agent-architecture stack.

Make model choice a decision you can defend

Bring one agent workflow. We'll map its steps, size the right model for each, and show you the governance layer that keeps it cheap and reversible.