Agent Architecture

Single agent or a crew? Choose on evidence, not vibes.

The single-vs-multi-agent decision sets your ceiling on cost, latency, and reliability for the life of the system. We help you make it on the actual shape of the work — not on whatever the last demo used.

  • When one agent with good tools wins
  • What truly forces a split
  • Handoff contracts & shared state
  • Cost, latency & failure tradeoffs
1
agent handles most production workflows fine
3
conditions that actually justify a split
2–5x
token cost from naive sub-agent fan-out
0
extra agents we add without a measured reason
// the core tradeoff

Every agent boundary is a cost, not a feature

Multi-agent diagrams look impressive. The whiteboard doesn't pay the token bill or debug the handoff.

A single agent keeps the entire task in one context window: it sees its own prior reasoning, the tool results, and the goal all at once. That coherence is the single biggest reliability lever you have. The moment you split work across agents, you trade it away — each agent now sees a summary of what another agent decided, and summaries lose the detail that turns out to matter.

Multi-agent systems earn their keep in exactly three situations: when sub-tasks run in genuine parallel with no data dependency, when different parts of the work sit behind different trust or permission boundaries, or when one agent's context would grow past what the model handles reliably. Outside those, a second agent is overhead — more latency, more tokens, more failure surface, and a harder system to reason about.

Our default is one capable agent with a well-designed tool surface. We split only when the work itself demands it, and we make the split where the seams already are.

Single agent vs. multi-agent

The same workflow, two architectures — and what each one actually costs you.

Single agent + toolsMulti-agent crew
ContextOne coherent window; full historyFragmented; passed via summaries
LatencyOne reasoning loopSerial hops add up; parallel can win
Token costLowest; no re-derivationOften 2–5x from repeated context
DebuggingOne trace to readCross-agent handoffs to reconcile
Best whenSequential, single trust domainParallel work or hard trust splits
Main riskContext bloat on huge tasksLost handoffs, ping-pong, cost blowup
// the decision

How we choose the topology

A short, concrete checklist we run against your real workflow before any code.

// where multi-agent breaks

Handoffs are where the work goes to die

Most multi-agent failures aren't reasoning failures — they're plumbing failures. A sub-agent re-summarizes context the orchestrator already had, drops the one constraint that mattered, and confidently completes the wrong task. Or two agents pass work back and forth, each waiting on the other, until your step budget runs out.

We design the seams first. Every handoff is an explicit, typed contract: required fields, the source of truth for each value, and what each agent is allowed to discard. State lives in one shared store the orchestrator owns — agents read from it rather than re-deriving it — and every loop carries a hard step and spend ceiling so a confused crew fails loudly instead of expensively.

  • Typed handoff contracts at every boundary
  • One shared state store, not re-summarization
  • Hard step + spend ceilings on every loop
// the engagement

How we land the architecture

From workflow map to a topology you can defend to your CTO.

01

Map the work

We trace the actual task end to end — dependencies, trust domains, and where context accumulates.

02

Baseline single

We stand up one agent with the right tools and measure cost, latency, and quality.

03

Split with cause

Where telemetry shows parallelism or a trust seam, we extract a sub-agent and a contract.

04

Govern & operate

Ceilings, lineage, and approval gates go in so the system stays safe as it scales.

Frequently asked questions

Isn't more agents always better?

No. Every agent boundary you add is a new place to lose context, drop a handoff, and burn tokens. A single agent with good tools beats a five-agent crew for most workflows. Split only when a single context window can no longer hold the job, or when sub-tasks genuinely run in parallel.

What actually forces a multi-agent design?

Three things: distinct trust or permission boundaries (an agent that can spend money should not share a loop with one that reads the open web), true parallelism where sub-tasks have no data dependency, and context pressure where one agent's prompt would exceed what the model handles reliably. Absent those, stay single.

How do multi-agent systems usually fail?

Handoff loss and cost blowup. Sub-agents re-derive context the orchestrator already had, summaries drop the one detail that mattered, and agents ping-pong work back and forth. We bound this with explicit contracts on each handoff, shared state instead of re-summarization, and hard step and spend ceilings.

Can you start single and split later?

Yes, and we recommend it. We build the action layer and state model so they're agent-count agnostic, ship a single agent first, then extract a sub-agent only when telemetry shows a specific bottleneck. Premature splitting is the more expensive mistake.

Related architecture decisions

The other choices that shape an agent system's cost and reliability.

Bring your workflow. Leave knowing how many agents it needs.

One working session to map your task, baseline a single agent, and decide — on evidence — whether it should ever become a crew.