OpenAI models, in a real agent architecture
GPT-4o, the o-series reasoners, function calling, and the Responses API are powerful primitives. We wire them into governed agents that take action in your systems — and stay portable to other providers.
- GPT-4o & o-series routing
- Native function / tool calling
- Structured outputs & JSON mode
- Provider-agnostic by design
A family of models, not a single tool
OpenAI ships several distinct capabilities — and using them well means knowing which one to call, and when.
OpenAI's lineup splits roughly in two. GPT-4o (and the smaller 4o-mini) is the workhorse: fast, multimodal, strong at instruction-following, native tool calling, and structured JSON output. The o-series — the reasoning models — trades latency and cost for deliberate, multi-step thinking on hard problems like planning, complex code, and tricky logic.
On top of the models sit the platform pieces we actually build against: function calling so the model can request tools, structured outputs so it returns schema-valid JSON instead of free text, embeddings for retrieval, and the Responses API for stateful, tool-using turns. Those are the surfaces an agent framework hooks into.
None of this is magic, and none of it is a moat. A model is one component of an agent. The value Automatic.co adds is the architecture around it — routing, guardrails, integration, and observability — which is exactly what keeps you free to change models later.
What we use OpenAI for
Capabilities map to roles inside an agent — we assign the right model to the right job.
Reasoning & planning
The o-series decomposes goals into steps, weighs options, and self-checks logic before an agent commits to an action.
Tool & function calling
GPT-4o reliably emits typed tool calls, letting an agent hit your APIs, databases, and internal services.
Structured extraction
JSON mode and structured outputs turn messy documents and threads into schema-valid records you can trust.
Retrieval & embeddings
OpenAI embeddings power semantic search and RAG so agents ground answers in your own knowledge, not guesses.
Drafting & summarization
Fast 4o handles high-volume copy, replies, and summaries where latency and cost matter more than deep reasoning.
Provider routing
We benchmark OpenAI against Anthropic and open models per task, then route each step to whichever wins on quality, cost, and speed.
How OpenAI enters a build
We don't start with the model. We start with the workflow and earn our way to it.
Scope the step
We define each agent step's job, its inputs and outputs, and its tolerance for latency, cost, and error.
Benchmark models
We test OpenAI head-to-head with alternatives on your real data — accuracy, latency, and cost per task.
Wire behind an interface
The chosen model sits behind a provider-agnostic adapter with guardrails, retries, and structured-output validation.
Instrument & route
We meter tokens, cost, and quality per call, then route or downshift models as production data comes in.
We use OpenAI. We're not married to it.
OpenAI is genuinely good at certain jobs — fast tool calling, multimodal input, and, with the o-series, hard reasoning. We reach for it when it's the best tool for a step, and we say so plainly when it isn't.
Every agent we build calls models through an abstraction layer, so the orchestration logic, guardrails, and integrations don't care who's behind the API. That means you can shift a step to Anthropic, an open-source model you host, or a future OpenAI release without re-architecting. Portability is a design requirement, not a nice-to-have.
- Provider-agnostic adapter layer
- Per-step model selection on the merits
- Azure OpenAI for in-tenant deployments
- Open-source fallback for air-gapped work
GPT-4o vs. the o-series
Same provider, different jobs — we route by what a step actually needs.
| GPT-4o / 4o-mini | o-series reasoners | |
|---|---|---|
| Best for | Tool calls, extraction, drafting | Planning, complex code, hard logic |
| Speed | Low latency, high throughput | Slower — it deliberates before answering |
| Cost profile | Cheaper per token | Higher — reserved for steps that need it |
| Where in the agent | Action & I/O layer | The planner / decision core |
Frequently asked questions
Does building on OpenAI lock us in?
No. We keep the model behind a provider-agnostic interface, so an agent that calls GPT-4o can fall back to Anthropic or an open-source model without rewriting the orchestration. We pick OpenAI where it earns its place — not as a default.
When do you reach for the o-series reasoning models over GPT-4o?
When a step needs multi-step deliberation — planning, math-heavy logic, code generation with verification, or untangling ambiguous instructions. For high-volume, latency-sensitive tool-calling and extraction, GPT-4o or 4o-mini is faster and cheaper, so we route by step.
Can OpenAI models run inside our security perimeter?
OpenAI is API-hosted, so calls leave your network. We deploy it through Azure OpenAI in your tenant when residency matters, scrub and tokenize sensitive fields before the call, and reserve fully air-gapped workloads for open-source models we host for you.
How do you control OpenAI cost and latency in production?
Per-step model routing, prompt caching, structured outputs to cut retries, streaming for perceived latency, and hard token and spend budgets per agent. We instrument every call so cost and quality are observable, not a surprise on the invoice.
Bring a workflow. We'll pick the right model.
One working session to scope your automation and benchmark OpenAI against the alternatives on your own data.