Open AI Responses API vs. Chat Completions vs. Anthropic Messages API

A side-by-side comparison of OpenAI's Chat Completions, Responses API, and Anthropic's Messages API, covering key differences, use cases, and how to avoid vendor lock-in with Portkey.

Open AI Responses API vs. Chat Completions vs. Anthropic Messages API

The LLM API landscape has never been more fragmented, or more consequential. As teams move from prototypes to production, the choice of which API format to build on shapes your vendor flexibility, your codebase complexity, and how quickly you can swap models when something better comes along.

Today, three API formats dominate how AI Agents talk to LLMs:

  • OpenAI's Chat Completions API — the de facto standard, universally supported
  • OpenAI's Responses API — the newer, agent-oriented evolution with built-in tools and state management
  • Anthropic's Messages API — Claude's native interface, with capabilities like extended thinking and prompt caching

Each was designed with different goals in mind. Understanding the differences affects how you build, how you scale, and how locked in you are to a single provider.

Portkey supports all three natively, and that's where standardization starts to matter.

Open AI Responses API vs. Chat Completions vs. Messages API: At a glance

Chat Completions Responses API Messages API
Provider OpenAI OpenAI Anthropic
Endpoint POST /v1/chat/completions POST /v1/responses POST /v1/messages
Design goal Stateless text generation Agentic workflows with built-in tools Claude-native capabilities
State management Manual Optional server-side (with store: true) Manual
Streaming
Tool / function calling ✅ (with built-in tools)
Built-in web search ✅ (via server tools)
Extended thinking ✅ (Claude only)
Prompt caching ✅ (with cache_control)
Computer use
Ecosystem compatibility Widest Growing Claude-specific

What makes each endpoint different

Chat Completions: the universal standard

Chat Completions (POST /v1/chat/completions) is where everything started. You send an array of messages, each with a role (system, user, assistant, or tool for tool call results) and the model replies. It's stateless by design, so you own the conversation history and pass it with every request.

This simplicity is its biggest strength. Because the model has no memory between calls, you have full control over what context it sees. And because practically every major provider has adopted this format, code written against Chat Completions works across OpenAI, Anthropic (via adapters), Gemini, Mistral, Bedrock, and other models with minimal changes.

What it does well:

  • Widest ecosystem of tools, frameworks, and libraries
  • Predictable, well-understood response format
  • Easiest path to switching providers or running multi-provider setups

What it doesn't do:

  • No built-in tools. Web search, code execution, file search all need external orchestration
  • No native support for extended reasoning or prompt caching
  • No server-side state, you manage conversation history entirely

The response object returns choices, where each choice contains a message with role: "assistant" and content. Tool calls come back in tool_calls. Clean and predictable, which is why it became the lingua franca of LLM APIs.

When to use it: When your use case is primarily text generation i.e., chatbots, summarization, classification, content generation, Q&A. It's the right default if you're using frameworks like LangChain or LlamaIndex that abstract over providers, or if cross-provider portability matters.

Responses API: built for agents

The Responses API (POST /v1/responses) takes a different approach. It's designed to run agentic loops as the model can call multiple built-in tools (web search, file search, code interpreter, computer use, remote MCP servers) within a single API request, without you orchestrating each step.

State management comes in two forms. With previous_response_id, you chain responses by referencing a prior response ID and the model picks up context without you resending the full history, but you're still tracking the ID yourself. The newer Conversations API goes further, maintaining a durable conversation object server-side that automatically accumulates turns across sessions.

What it does well:

  • Built-in tools that run within a single request, no external orchestration needed
  • previous_response_id chains turns without resending prior tokens
  • Designed for multi-step agentic workflows where context and tool results accumulate
  • Better cache utilization compared to Chat Completions for repeated context

What it doesn't do:

  • Natively available on OpenAI models only (though Portkey makes it work across providers)
  • More complex response structure, output is an array of typed items rather than a single message
  • Overkill for simple single-turn completions

When to use it: When you're building autonomous agents that use built-in tools, or multi-turn workflows where you want to reduce token overhead across turns. It's the right choice when agentic behavior and tool use are core to your application.

Messages API: Claude's native interface

Anthropic's Messages API (POST /v1/messages) is designed around how Claude works. While it shares surface similarities with Chat Completions, it exposes capabilities that are specific to Claude and don't exist in OpenAI's formats.

What it does well:

  • Extended thinking: Claude returns type: "thinking" content blocks before the final answer, exposing its reasoning process.
  • Prompt caching: Fine-grained cache_control lets you cache specific content blocks (with 5-minute or 1-hour TTLs), reducing latency and cost significantly for repeated context
  • Rich content blocks: The content array supports text, images, PDFs, tool use, thinking blocks, and citations pointing to source documents
  • Stop reason granularity: stop_reason can be end_turn, max_tokens, stop_sequence, tool_use, pause_turn, or refusal
  • Native web search: Pass {"type": "web_search_20250305", "name": "web_search"} in the tools array and Claude handles execution server-side, returning results in a server_tool_use response block

What it doesn't do:

  • No server-side state management (you manage history yourself)
  • Not natively compatible with non-Anthropic providers without a translation layer

The response object returns a content array of typed blocks. A single response might include a thinking block, a text block, and a tool_use block in sequence. Citations on text blocks tell you exactly which document or character range the model drew from.

When to use it: When you're building specifically on Claude and need extended thinking for complex reasoning, prompt caching for document-heavy workloads, or reasoning transparency in your application.

How Portkey supports all three

Most teams end up needing more than one of these formats. You might use Chat Completions for a general assistant, the Responses API for an autonomous agent, and Claude's Messages API for a document reasoning pipeline, all in the same AI Agent.

Building direct integrations with each provider means separate SDKs, separate observability, and code that breaks every time you want to try a new model.

Portkey sits between your application and every provider, handling the translation so you don't have to.

The best part: you can use any of the three API formats with any provider and model.

Want to use the Messages API format but route to a Gemini model? Portkey handles the transformation. Want Chat Completions format but call a Claude model? Same thing.

Beyond format flexibility, Portkey adds what direct API access can't give you:

  • Observability: Every request logged, traced, and searchable across all providers and API formats
  • Fallbacks and load balancing: Route to a backup provider if your primary is down or rate-limited
  • Prompt management: Version, test, and deploy prompts centrally
  • Cost tracking: Unified spend view across providers and models
  • Governance: Enterprise controls over which teams access which models

Making calls with each API through Portkey

Chat Completions

from portkey_ai import Portkey

portkey = Portkey(api_key="PORTKEY_API_KEY")

response = portkey.chat.completions.create(
    model="@openai-provider/gpt-5.2",
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}]
)

print(response.choices[0].message.content)

Responses API

from portkey_ai import Portkey

portkey = Portkey(api_key="PORTKEY_API_KEY")

response = portkey.responses.create(
    model="@openai-provider/gpt-4o",
    input="Explain quantum computing in simple terms"
)

print(response.output_text)

Messages API

import anthropic

client = anthropic.Anthropic(
    api_key="PORTKEY_API_KEY",
    base_url="https://api.portkey.ai"
)

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}]
)

print(message.content[0].text)

Switching providers without changing your code

The real payoff is when you want to swap providers.

To move from OpenAI to Claude on Responses, all you need to do is change the provider name and model:

from portkey_ai import Portkey

portkey = Portkey(api_key="PORTKEY_API_KEY")

response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="Explain quantum computing in simple terms"
)

print(response.output_text)

Your application code, your observability, your fallback logic, none of it changes. The format stays the same and Portkey's universal API handles the translation to whichever provider you're routing to.

Getting started

All three API formats work through Portkey with a single configuration change, pointing your SDK's base URL at Portkey's gateway. From there, routing, translation, observability, and reliability are handled for you.

Get started with Portkey | Read the docs

To see how explore how Portkey can support your AI strategy, book a demo here.