Agent Architecture

Single agent or a crew? Choose on evidence, not vibes.

The single-vs-multi-agent decision sets your ceiling on cost, latency, and reliability for the life of the system. We help you make it on the actual shape of the work — not on whatever the last demo used.

When one agent with good tools wins
What truly forces a split
Handoff contracts & shared state
Cost, latency & failure tradeoffs

Book a Call Get Started

agent handles most production workflows fine

conditions that actually justify a split

2–5x

token cost from naive sub-agent fan-out

extra agents we add without a measured reason

// the core tradeoff

Every agent boundary is a cost, not a feature

Multi-agent diagrams look impressive. The whiteboard doesn't pay the token bill or debug the handoff.

A single agent keeps the entire task in one context window: it sees its own prior reasoning, the tool results, and the goal all at once. That coherence is the single biggest reliability lever you have. The moment you split work across agents, you trade it away — each agent now sees a summary of what another agent decided, and summaries lose the detail that turns out to matter.

Multi-agent systems earn their keep in exactly three situations: when sub-tasks run in genuine parallel with no data dependency, when different parts of the work sit behind different trust or permission boundaries, or when one agent's context would grow past what the model handles reliably. Outside those, a second agent is overhead — more latency, more tokens, more failure surface, and a harder system to reason about.

Our default is one capable agent with a well-designed tool surface. We split only when the work itself demands it, and we make the split where the seams already are.

Single agent vs. multi-agent

The same workflow, two architectures — and what each one actually costs you.

	Single agent + tools	Multi-agent crew
Context	One coherent window; full history	Fragmented; passed via summaries
Latency	One reasoning loop	Serial hops add up; parallel can win
Token cost	Lowest; no re-derivation	Often 2–5x from repeated context
Debugging	One trace to read	Cross-agent handoffs to reconcile
Best when	Sequential, single trust domain	Parallel work or hard trust splits
Main risk	Context bloat on huge tasks	Lost handoffs, ping-pong, cost blowup

// the decision

How we choose the topology

A short, concrete checklist we run against your real workflow before any code.

Start single

One agent, rich tools, shared memory. This is the baseline we measure everything else against.

Test for parallelism

If sub-tasks have no data dependency and run concurrently, fan-out genuinely cuts wall-clock time.

Map trust boundaries

An agent that can spend money or write to prod shouldn't share a loop with one that reads untrusted input.

Watch context pressure

When one agent's prompt would outgrow reliable handling, split to keep each context tight.

Define handoff contracts

Each boundary gets a typed contract: exactly what passes, what's required, and what's discarded.

Instrument before splitting

We extract a sub-agent only when traces show a real bottleneck — never on speculation.

// where multi-agent breaks

Handoffs are where the work goes to die

Most multi-agent failures aren't reasoning failures — they're plumbing failures. A sub-agent re-summarizes context the orchestrator already had, drops the one constraint that mattered, and confidently completes the wrong task. Or two agents pass work back and forth, each waiting on the other, until your step budget runs out.

We design the seams first. Every handoff is an explicit, typed contract: required fields, the source of truth for each value, and what each agent is allowed to discard. State lives in one shared store the orchestrator owns — agents read from it rather than re-deriving it — and every loop carries a hard step and spend ceiling so a confused crew fails loudly instead of expensively.

Typed handoff contracts at every boundary
One shared state store, not re-summarization
Hard step + spend ceilings on every loop

How we handle exceptions

// the engagement

How we land the architecture

From workflow map to a topology you can defend to your CTO.

Map the work

We trace the actual task end to end — dependencies, trust domains, and where context accumulates.

Baseline single

We stand up one agent with the right tools and measure cost, latency, and quality.

Split with cause

Where telemetry shows parallelism or a trust seam, we extract a sub-agent and a contract.

Govern & operate

Ceilings, lineage, and approval gates go in so the system stays safe as it scales.

Frequently asked questions

Isn't more agents always better?

No. Every agent boundary you add is a new place to lose context, drop a handoff, and burn tokens. A single agent with good tools beats a five-agent crew for most workflows. Split only when a single context window can no longer hold the job, or when sub-tasks genuinely run in parallel.

What actually forces a multi-agent design?

Three things: distinct trust or permission boundaries (an agent that can spend money should not share a loop with one that reads the open web), true parallelism where sub-tasks have no data dependency, and context pressure where one agent's prompt would exceed what the model handles reliably. Absent those, stay single.

How do multi-agent systems usually fail?

Handoff loss and cost blowup. Sub-agents re-derive context the orchestrator already had, summaries drop the one detail that mattered, and agents ping-pong work back and forth. We bound this with explicit contracts on each handoff, shared state instead of re-summarization, and hard step and spend ceilings.

Can you start single and split later?

Yes, and we recommend it. We build the action layer and state model so they're agent-count agnostic, ship a single agent first, then extract a sub-agent only when telemetry shows a specific bottleneck. Premature splitting is the more expensive mistake.

Related architecture decisions

The other choices that shape an agent system's cost and reliability.

Stateful vs. stateless Action execution Approval gates Exception handling Risk thresholds Model governance Cost controls AI redundancy

Bring your workflow. Leave knowing how many agents it needs.

One working session to map your task, baseline a single agent, and decide — on evidence — whether it should ever become a crew.

Book a Call Get Started