Framework

AutoGen, used where conversation actually helps

AutoGen is Microsoft's framework for multi-agent conversations and sandboxed code execution. We use it where iterative, agent-to-agent dialogue earns its keep — and we're honest about where it doesn't.

  • Multi-agent conversation loops
  • Sandboxed code execution
  • Model-agnostic clients
  • Wrapped in a durable runtime
MIT
open-source license, no vendor lock-in
N agents
specialized roles in one conversation
Async
event-driven core in the current release
Any LLM
Anthropic, OpenAI, or open-weight
// what it is

A framework for agents that talk to each other

Most LLM apps run one model against one prompt. AutoGen models the work as a conversation between several agents instead.

In AutoGen you define a small cast of agents, each with its own system prompt and tools — say a planner that breaks down a task, a coder that writes a solution, a critic that pokes holes in it, and an executor that actually runs the code in a sandbox. They exchange messages under a defined chat pattern until a termination condition is met.

The interesting part is the loop. Because agents react to each other's output, AutoGen is well suited to problems where the first answer is rarely the final answer: generate, run, read the error, fix, run again. That tight feedback cycle — especially with real code execution — is the thing AutoGen does better than most alternatives.

// the pieces

What AutoGen brings to an architecture

The capabilities we actually use on engagements — not the whole API surface.

Conversation patterns

Two-agent chats, group chats with a manager, and nested conversations — the topology of who talks to whom is first-class and configurable.

Sandboxed code execution

AutoGen's executor agent runs generated code in a Docker-isolated sandbox and feeds the result back into the loop. This is its signature strength.

Critic / reflection loops

A dedicated critic agent reviews drafts before they ship, which measurably lifts quality on writing, analysis, and code tasks.

Human-in-the-loop

A UserProxy agent can pause the conversation for human input or approval at any turn — clean hooks for our governance gates.

Pluggable model clients

The model client is an interface, so we run AutoGen against Anthropic or OpenAI without touching the agent definitions.

Tool & function calling

Agents register typed tools and call your APIs, databases, or internal services as part of the conversation.

// how we deploy it

From AutoGen demo to production loop

The path we take to make a conversation framework operationally trustworthy.

01

Frame the loop

We decide which roles earn an agent and what termination looks like — over-splitting agents wastes tokens and adds latency.

02

Sandbox & scope

We lock code execution into an isolated container with no standing credentials and a tight allow-list of tools.

03

Wrap for durability

We run the conversation inside Temporal or a queue so retries, timeouts, and approvals survive restarts.

04

Observe & cap

We add tracing, token budgets, and turn limits so a runaway chat can't loop or spend without bound.

// honest boundaries

AutoGen is a reasoning loop, not an orchestrator

AutoGen is excellent at the cognitive part of a task — agents reasoning, critiquing, and iterating toward an answer. It is not, by itself, a workflow engine. It has no built-in durable state, no native scheduling, and no answer for what happens when a process dies mid-conversation.

So we keep the layers separate. AutoGen owns the agent dialogue; a durable runtime like Temporal owns retries, long-running waits, and human approvals. That split keeps each tool doing what it's genuinely good at, and keeps your system debuggable when something goes wrong at 3am.

  • AutoGen handles the conversation and code loop
  • Temporal or a queue handles durability and recovery
  • Governance gates live in the durable layer, not the prompt

AutoGen vs. the alternatives we also use

We're framework-agnostic. Here's the rough decision we make per workload.

AutoGenWhen we pick something else
Best fitIterative, conversational, code-in-the-loop tasksLangChain for linear tool pipelines; CrewAI for simple role teams
Standout featureSandboxed code execution agentLangChain's integrations; CrewAI's ergonomics
State & retriesNone built in — needs a wrapperTemporal / n8n / Airflow own durability
Lock-inMIT, model-agnosticAll open and swappable — same principle

Frequently asked questions

What is AutoGen, exactly?

AutoGen is Microsoft's open-source framework for orchestrating conversations between multiple LLM-backed agents. Instead of one prompt doing everything, specialized agents — a planner, a coder, a critic, an executor — message each other until a task is done. It also ships a sandboxed code-execution agent, which is its standout feature.

When do you choose AutoGen over LangChain or CrewAI?

We reach for AutoGen when the work is genuinely conversational and iterative — a draft-critique-revise loop, or anything that benefits from running code and reacting to the output. For linear tool-calling pipelines we usually prefer LangChain; for role-based teams with simpler handoffs, CrewAI is often lighter.

Is AutoGen production-ready?

The newer AutoGen (the asyncio, event-driven core) is solid for the right workloads, but it is not an orchestrator on its own. We wrap AutoGen conversations inside a durable layer — Temporal or a queue — so that retries, timeouts, and human approvals survive process restarts. AutoGen handles the reasoning loop; the durable layer handles the operational guarantees.

Does using AutoGen lock us into Azure or OpenAI?

No. AutoGen is model-agnostic — we run it against Anthropic, OpenAI, or open-weight models behind a compatible endpoint. The framework is open source under MIT. We keep the model client behind an interface so you can switch providers without rewriting the agent logic.

Explore the rest of the stack

We mix and match these per engagement — never one vendor for everything.

Have a draft-critique-revise problem?

Bring us the workflow. We'll tell you honestly whether AutoGen is the right loop — or whether something simpler does the job.