Agent Architecture

Model governance for agents that run in production

Which model runs which step, which version is pinned, what passes before rollout — these are policy decisions, not strings buried in agent code. We make model choice a governed, observable, reversible layer.

Per-step model routing
Version pinning & promotion
Pre-rollout evaluation
Provider fallback & redundancy

Book a Call Get Started

40–70%

typical spend cut from routing steps to the right-sized model

1 layer

model choice lives in policy, not agent code

100%

of steps log the exact model + version they ran

silent auto-upgrades to a new model version

// the engineering decision

Model choice is an architectural layer, not a constant

The cheapest way to ship an agent is to hard-code one model. It is also the fastest way to a brittle, expensive system.

When a model name is a literal inside agent logic, every concern collapses onto one string: cost, quality, latency, provider availability, and deprecation risk all become invisible until they break. You cannot A/B a swap, you cannot roll back cleanly, and you cannot answer the audit question — which model produced this decision, and was it the version we validated?

Model governance separates the choice of model from the work the agent does. A routing policy maps each step to a model class, version pins lock what actually ships, and an evaluation gate decides what gets promoted. The agent calls a step; the policy decides the model. That indirection is what makes a fleet safe to operate and cheap to evolve.

// what the layer covers

The pieces of a governance layer

Each is a deliberate engineering choice with its own tradeoffs — and each links to the part of the architecture it touches.

Per-step routing

Profile every step by reasoning depth, context, latency, and cost ceiling, then route to the right model class instead of one model for everything.

Version pinning

Pin the exact model + version each step runs. No provider auto-upgrade reaches production without passing your gate first.

Promotion policy

A new model is shadow-tested on real traffic and promoted only when it beats the incumbent on quality, cost, and latency.

Fallback & redundancy

When a provider degrades or rate-limits, the policy fails over to an equivalent model so the workflow doesn't stall.

Cost controls

Per-step budgets, model-class caps, and token ceilings stop a runaway loop from quietly burning a frontier-model bill.

Decision lineage

Every step records its model, version, prompt hash, and cost, so any output traces back to the exact configuration that produced it.

// how a model change ships

Promoting a model, safely

The path a new or swapped model takes before it touches a single live decision.

Evaluate

Replay it against the held-out evaluation set and recent real traffic, scoring quality, cost, and latency vs. the incumbent.

Shadow

Run it in parallel on a slice of live steps, comparing outputs without acting on them, to catch tail-case regressions.

Promote

Flip the routing policy to the new pin. If anything regresses, roll back by reverting one config value — no code deploy.

// the tradeoff

Right-sized, not biggest-available

The instinct is to point every step at the strongest frontier model and trust it. For the handful of genuinely hard steps — planning, adjudication, ambiguous extraction — that's correct. For the high-volume majority — classification, routing, structured extraction, formatting — it's wasteful and often slower than it needs to be.

We benchmark each step against your own cases, then route the bulk of traffic to small fast models and reserve frontier capacity for the steps that earn it. The result is a fleet that's cheaper, lower-latency, and — because the policy is explicit — auditable. Governance is part of the architecture, not bolted on after the bill arrives.

Step-level benchmarks on your real cases
Frontier models reserved for hard steps
Explicit, reviewable routing policy

Cost controls

Hard-coded model vs. governed model layer

The difference between an agent that demos well and a fleet you can operate.

	Hard-coded model string	Governed model layer
Model choice	One literal in agent code	Routing policy, per-step
New version	Silent or manual swap	Pinned, evaluated, then promoted
Rollback	Code change + redeploy	Revert one config value
Provider outage	Workflow stalls	Fails over to equivalent model
Audit	Unknown which model ran	Exact model + version logged
Cost	One model for every step	Right-sized per step

Frequently asked questions

Why not just hard-code one model and move on?

Because a single string in your agent code becomes a single point of failure and a silent cost center. A pinned model gets deprecated, a cheaper one ships, or a step starts failing — and with no governance layer you find out in production. A policy layer lets you swap, route, and roll back without touching agent logic.

How do you decide which model handles which step?

We profile each step by what it actually needs — reasoning depth, context length, tool-calling reliability, latency budget, and cost ceiling — then route accordingly. A cheap fast model classifies and extracts; a frontier model plans and adjudicates. Routing is a policy, evaluated against held-out cases, not a guess.

What happens when a provider ships a new model version?

Nothing automatic. New versions are pinned and shadow-tested against your evaluation set first. We compare quality, cost, and latency on your real traffic before promotion, and every step records the exact model and version it ran so a regression is traceable to a single change.

Does governance slow the agents down or add cost?

Net, it usually lowers cost. Routing the bulk of cheap, high-volume steps to small models and reserving frontier models for the few hard ones often cuts spend 40 to 70 percent. The governance layer itself is config and logging — negligible latency, and it pays for itself the first time a swap saves a rollback.

Related architecture decisions

Model governance sits alongside the rest of the agent-architecture stack.

Single vs. multi-agent Stateful vs. stateless Action execution Approval gates Exception handling Agent versioning Decision lineage AI redundancy

Make model choice a decision you can defend

Bring one agent workflow. We'll map its steps, size the right model for each, and show you the governance layer that keeps it cheap and reversible.

Book a Call Get Started