Technology / Models

OpenAI models, in a real agent architecture

Q: Does building on OpenAI lock us in?

No. We keep the model behind a provider-agnostic interface, so an agent that calls GPT-4o can fall back to Anthropic or an open-source model without rewriting the orchestration. We pick OpenAI where it earns its place — not as a default.

Q: When do you reach for the o-series reasoning models over GPT-4o?

When a step needs multi-step deliberation — planning, math-heavy logic, code generation with verification, or untangling ambiguous instructions. For high-volume, latency-sensitive tool-calling and extraction, GPT-4o or 4o-mini is faster and cheaper, so we route by step.

Q: Can OpenAI models run inside our security perimeter?

OpenAI is API-hosted, so calls leave your network. We deploy it through Azure OpenAI in your tenant when residency matters, scrub and tokenize sensitive fields before the call, and reserve fully air-gapped workloads for open-source models we host for you.

Q: How do you control OpenAI cost and latency in production?

Per-step model routing, prompt caching, structured outputs to cut retries, streaming for perceived latency, and hard token and spend budgets per agent. We instrument every call so cost and quality are observable, not a surprise on the invoice.

GPT-4o, the o-series reasoners, function calling, and the Responses API are powerful primitives. We wire them into governed agents that take action in your systems — and stay portable to other providers.

GPT-4o & o-series routing
Native function / tool calling
Structured outputs & JSON mode
Provider-agnostic by design

Book a Call Get Started

model families we route between — fast 4o and deliberate o-series

128K

token context window we plan agent memory around

<1s

first-token latency targets via streaming

lock-in: swap providers without rewriting agents

// what it is

A family of models, not a single tool

OpenAI ships several distinct capabilities — and using them well means knowing which one to call, and when.

OpenAI's lineup splits roughly in two. GPT-4o (and the smaller 4o-mini) is the workhorse: fast, multimodal, strong at instruction-following, native tool calling, and structured JSON output. The o-series — the reasoning models — trades latency and cost for deliberate, multi-step thinking on hard problems like planning, complex code, and tricky logic.

On top of the models sit the platform pieces we actually build against: function calling so the model can request tools, structured outputs so it returns schema-valid JSON instead of free text, embeddings for retrieval, and the Responses API for stateful, tool-using turns. Those are the surfaces an agent framework hooks into.

None of this is magic, and none of it is a moat. A model is one component of an agent. The value Automatic.co adds is the architecture around it — routing, guardrails, integration, and observability — which is exactly what keeps you free to change models later.

// where it fits

What we use OpenAI for

Capabilities map to roles inside an agent — we assign the right model to the right job.

Reasoning & planning

The o-series decomposes goals into steps, weighs options, and self-checks logic before an agent commits to an action.

Tool & function calling

GPT-4o reliably emits typed tool calls, letting an agent hit your APIs, databases, and internal services.

Structured extraction

JSON mode and structured outputs turn messy documents and threads into schema-valid records you can trust.

Retrieval & embeddings

OpenAI embeddings power semantic search and RAG so agents ground answers in your own knowledge, not guesses.

Drafting & summarization

Fast 4o handles high-volume copy, replies, and summaries where latency and cost matter more than deep reasoning.

Provider routing

We benchmark OpenAI against Anthropic and open models per task, then route each step to whichever wins on quality, cost, and speed.

// how we adopt it

How OpenAI enters a build

We don't start with the model. We start with the workflow and earn our way to it.

Scope the step

We define each agent step's job, its inputs and outputs, and its tolerance for latency, cost, and error.

Benchmark models

We test OpenAI head-to-head with alternatives on your real data — accuracy, latency, and cost per task.

Wire behind an interface

The chosen model sits behind a provider-agnostic adapter with guardrails, retries, and structured-output validation.

Instrument & route

We meter tokens, cost, and quality per call, then route or downshift models as production data comes in.

// vendor-honest

We use OpenAI. We're not married to it.

OpenAI is genuinely good at certain jobs — fast tool calling, multimodal input, and, with the o-series, hard reasoning. We reach for it when it's the best tool for a step, and we say so plainly when it isn't.

Every agent we build calls models through an abstraction layer, so the orchestration logic, guardrails, and integrations don't care who's behind the API. That means you can shift a step to Anthropic, an open-source model you host, or a future OpenAI release without re-architecting. Portability is a design requirement, not a nice-to-have.

Provider-agnostic adapter layer
Per-step model selection on the merits
Azure OpenAI for in-tenant deployments
Open-source fallback for air-gapped work

Compare with Anthropic

GPT-4o vs. the o-series

Same provider, different jobs — we route by what a step actually needs.

	GPT-4o / 4o-mini	o-series reasoners
Best for	Tool calls, extraction, drafting	Planning, complex code, hard logic
Speed	Low latency, high throughput	Slower — it deliberates before answering
Cost profile	Cheaper per token	Higher — reserved for steps that need it
Where in the agent	Action & I/O layer	The planner / decision core

Frequently asked questions

Does building on OpenAI lock us in?

No. We keep the model behind a provider-agnostic interface, so an agent that calls GPT-4o can fall back to Anthropic or an open-source model without rewriting the orchestration. We pick OpenAI where it earns its place — not as a default.

When do you reach for the o-series reasoning models over GPT-4o?

When a step needs multi-step deliberation — planning, math-heavy logic, code generation with verification, or untangling ambiguous instructions. For high-volume, latency-sensitive tool-calling and extraction, GPT-4o or 4o-mini is faster and cheaper, so we route by step.

Can OpenAI models run inside our security perimeter?

OpenAI is API-hosted, so calls leave your network. We deploy it through Azure OpenAI in your tenant when residency matters, scrub and tokenize sensitive fields before the call, and reserve fully air-gapped workloads for open-source models we host for you.

How do you control OpenAI cost and latency in production?

Per-step model routing, prompt caching, structured outputs to cut retries, streaming for perceived latency, and hard token and spend budgets per agent. We instrument every call so cost and quality are observable, not a surprise on the invoice.

Bring a workflow. We'll pick the right model.

One working session to scope your automation and benchmark OpenAI against the alternatives on your own data.

Book a Call Get Started