Single agent or a crew? Choose on evidence, not vibes.
The single-vs-multi-agent decision sets your ceiling on cost, latency, and reliability for the life of the system. We help you make it on the actual shape of the work — not on whatever the last demo used.
- When one agent with good tools wins
- What truly forces a split
- Handoff contracts & shared state
- Cost, latency & failure tradeoffs
Every agent boundary is a cost, not a feature
Multi-agent diagrams look impressive. The whiteboard doesn't pay the token bill or debug the handoff.
A single agent keeps the entire task in one context window: it sees its own prior reasoning, the tool results, and the goal all at once. That coherence is the single biggest reliability lever you have. The moment you split work across agents, you trade it away — each agent now sees a summary of what another agent decided, and summaries lose the detail that turns out to matter.
Multi-agent systems earn their keep in exactly three situations: when sub-tasks run in genuine parallel with no data dependency, when different parts of the work sit behind different trust or permission boundaries, or when one agent's context would grow past what the model handles reliably. Outside those, a second agent is overhead — more latency, more tokens, more failure surface, and a harder system to reason about.
Our default is one capable agent with a well-designed tool surface. We split only when the work itself demands it, and we make the split where the seams already are.
Single agent vs. multi-agent
The same workflow, two architectures — and what each one actually costs you.
| Single agent + tools | Multi-agent crew | |
|---|---|---|
| Context | One coherent window; full history | Fragmented; passed via summaries |
| Latency | One reasoning loop | Serial hops add up; parallel can win |
| Token cost | Lowest; no re-derivation | Often 2–5x from repeated context |
| Debugging | One trace to read | Cross-agent handoffs to reconcile |
| Best when | Sequential, single trust domain | Parallel work or hard trust splits |
| Main risk | Context bloat on huge tasks | Lost handoffs, ping-pong, cost blowup |
How we choose the topology
A short, concrete checklist we run against your real workflow before any code.
Start single
One agent, rich tools, shared memory. This is the baseline we measure everything else against.
Test for parallelism
If sub-tasks have no data dependency and run concurrently, fan-out genuinely cuts wall-clock time.
Map trust boundaries
An agent that can spend money or write to prod shouldn't share a loop with one that reads untrusted input.
Watch context pressure
When one agent's prompt would outgrow reliable handling, split to keep each context tight.
Define handoff contracts
Each boundary gets a typed contract: exactly what passes, what's required, and what's discarded.
Instrument before splitting
We extract a sub-agent only when traces show a real bottleneck — never on speculation.
Handoffs are where the work goes to die
Most multi-agent failures aren't reasoning failures — they're plumbing failures. A sub-agent re-summarizes context the orchestrator already had, drops the one constraint that mattered, and confidently completes the wrong task. Or two agents pass work back and forth, each waiting on the other, until your step budget runs out.
We design the seams first. Every handoff is an explicit, typed contract: required fields, the source of truth for each value, and what each agent is allowed to discard. State lives in one shared store the orchestrator owns — agents read from it rather than re-deriving it — and every loop carries a hard step and spend ceiling so a confused crew fails loudly instead of expensively.
- Typed handoff contracts at every boundary
- One shared state store, not re-summarization
- Hard step + spend ceilings on every loop
How we land the architecture
From workflow map to a topology you can defend to your CTO.
Map the work
We trace the actual task end to end — dependencies, trust domains, and where context accumulates.
Baseline single
We stand up one agent with the right tools and measure cost, latency, and quality.
Split with cause
Where telemetry shows parallelism or a trust seam, we extract a sub-agent and a contract.
Govern & operate
Ceilings, lineage, and approval gates go in so the system stays safe as it scales.
Frequently asked questions
Isn't more agents always better?
No. Every agent boundary you add is a new place to lose context, drop a handoff, and burn tokens. A single agent with good tools beats a five-agent crew for most workflows. Split only when a single context window can no longer hold the job, or when sub-tasks genuinely run in parallel.
What actually forces a multi-agent design?
Three things: distinct trust or permission boundaries (an agent that can spend money should not share a loop with one that reads the open web), true parallelism where sub-tasks have no data dependency, and context pressure where one agent's prompt would exceed what the model handles reliably. Absent those, stay single.
How do multi-agent systems usually fail?
Handoff loss and cost blowup. Sub-agents re-derive context the orchestrator already had, summaries drop the one detail that mattered, and agents ping-pong work back and forth. We bound this with explicit contracts on each handoff, shared state instead of re-summarization, and hard step and spend ceilings.
Can you start single and split later?
Yes, and we recommend it. We build the action layer and state model so they're agent-count agnostic, ship a single agent first, then extract a sub-agent only when telemetry shows a specific bottleneck. Premature splitting is the more expensive mistake.
Related architecture decisions
The other choices that shape an agent system's cost and reliability.
Bring your workflow. Leave knowing how many agents it needs.
One working session to map your task, baseline a single agent, and decide — on evidence — whether it should ever become a crew.