Skip to main content
Enterprise Feature
gRPC support is available on Enterprise self-hosted plans only. Contact the Portkey team to enable it for your gateway deployment.
gRPC support is currently in beta. The API surface may change based on feedback.
The Portkey Gateway supports gRPC as an alternative transport protocol alongside HTTP/REST. This enables lower latency, efficient binary serialization via Protocol Buffers, and native streaming support for applications that prefer gRPC communication.

How It Works

The gateway operates in two modes depending on the provider:
ModeDescriptionUse Case
gRPC → HTTP ProxyGateway accepts gRPC requests and converts them to HTTP internallyWorks with all providers
Native gRPCGateway connects to the provider’s native gRPC endpoint directlyLower latency for supported providers (e.g., Google Gemini)
For providers without a native gRPC endpoint, the gateway transparently proxies gRPC requests over HTTP — so every provider supported by Portkey works out of the box. When a provider does expose a native gRPC API (currently Google Gemini), the gateway connects directly for optimal performance.

Starting the gRPC Server

Command Line Flags

# Start only the gRPC server (default port 8789)
npm start -- --llm-grpc

# Start both HTTP and gRPC servers
npm start -- --llm-node --llm-grpc

# With custom ports
npm start -- --llm-node --llm-grpc --port 8787 --grpc-port 50051

Environment Variables

VariableDefaultDescription
GRPC_PORT8789Port for the gRPC server
PORT8787Port for the HTTP server (used as base URL for internal routing)

Enabling TLS

The gRPC server supports TLS using the same certificates as the HTTP server:
TLS_KEY_PATH=/path/to/key.pem \
TLS_CERT_PATH=/path/to/cert.pem \
npm start -- --llm-grpc

Authentication

Pass your Portkey API key as gRPC metadata:
Metadata KeyDescription
x-portkey-api-keyYour Portkey API key
With the Model Catalog, the provider is specified in the model string itself (@provider_slug/model_name), so separate provider headers are typically not needed.

Making Requests

All request bodies are sent as a JSON string in the input field of the GatewayRequest message. Each endpoint returns responses in a consistent format matching the API you called, regardless of the underlying provider:
  • ChatCompletions and Embeddings — OpenAI-compatible format
  • Messages — Anthropic Messages format
  • Responses — OpenAI Responses API format
The gateway handles all provider-to-format translation automatically — you always get the format matching the endpoint you called, no matter which LLM is behind it.

Model String Format

Portkey uses the Model Catalog format for model strings:
@provider_slug/model_name
Examples: @openai/gpt-4o, @gemini/gemini-2.0-flash, @anthropic/claude-3-opus-20240229, @azure-openai/gpt-4

Chat Completions

# Non-streaming
grpcurl -plaintext \
  -H 'x-portkey-api-key: YOUR_PORTKEY_KEY' \
  -d '{
    "input": "{\"model\": \"@openai/gpt-4o\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello!\"}]}"
  }' \
  localhost:8789 gateway.Gateway/ChatCompletions
# Streaming
grpcurl -plaintext \
  -H 'x-portkey-api-key: YOUR_PORTKEY_KEY' \
  -d '{
    "input": "{\"model\": \"@openai/gpt-4o\", \"messages\": [{\"role\": \"user\", \"content\": \"Count from 1 to 5\"}], \"stream\": true}"
  }' \
  localhost:8789 gateway.Gateway/ChatCompletionsStream

Anthropic Messages

grpcurl -plaintext \
  -H 'x-portkey-api-key: YOUR_PORTKEY_KEY' \
  -d '{
    "input": "{\"model\": \"@anthropic/claude-3-opus-20240229\", \"messages\": [{\"role\": \"user\", \"content\": \"What is the capital of France?\"}], \"max_tokens\": 100}"
  }' \
  localhost:8789 gateway.Gateway/Messages

OpenAI Responses

grpcurl -plaintext \
  -H 'x-portkey-api-key: YOUR_PORTKEY_KEY' \
  -d '{
    "input": "{\"model\": \"@openai/gpt-4o\", \"input\": \"Tell me a joke\"}"
  }' \
  localhost:8789 gateway.Gateway/Responses

Embeddings

grpcurl -plaintext \
  -H 'x-portkey-api-key: YOUR_PORTKEY_KEY' \
  -d '{
    "input": "{\"model\": \"@openai/text-embedding-3-small\", \"input\": \"The quick brown fox jumps over the lazy dog\"}"
  }' \
  localhost:8789 gateway.Gateway/Embeddings

Health Check

grpcurl -plaintext localhost:8789 gateway.Gateway/Health
{
  "status": "success",
  "message": "Server is healthy",
  "version": "1.x.x"
}

Native gRPC Providers

When a provider exposes a native gRPC API, the gateway bypasses HTTP entirely and makes direct gRPC calls for the lowest possible latency.
ProviderTransportStreamingEndpoint
Google GeminiNative gRPCYesgenerativelanguage.googleapis.com:443
OpenAIHTTP proxyYes
AnthropicHTTP proxyYes
Azure OpenAIHTTP proxyYes

How Native gRPC Works

When a request targets a native gRPC provider, the gateway:
  1. Detects the gRPC transport via the x-portkey-gateway-transport: grpc header
  2. Transforms the request into the provider’s native gRPC format
  3. Makes a direct gRPC call to the provider’s endpoint
  4. Transforms the response back to the format matching the endpoint you called
This eliminates HTTP/JSON serialization overhead, uses efficient binary Protocol Buffer encoding, and maintains persistent gRPC connections with client caching.
grpcurl -plaintext \
  -H 'x-portkey-api-key: YOUR_PORTKEY_KEY' \
  -d '{
    "input": "{\"model\": \"@gemini/gemini-2.0-flash\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello!\"}]}"
  }' \
  localhost:8789 gateway.Gateway/ChatCompletions
The gateway handles all format transformations automatically — you send requests in the format matching the endpoint (Chat Completions, Messages, or Responses) and receive responses in that same format, regardless of the underlying provider.

gRPC Service Definition

The gateway exposes a single Gateway service with the following methods:
service Gateway {
  // Health check
  rpc Health(Empty) returns (HealthResponse);

  // Embeddings (non-streaming)
  rpc Embeddings(GatewayRequest) returns (GatewayResponse);

  // Chat completions
  rpc ChatCompletions(GatewayRequest) returns (GatewayResponse);
  rpc ChatCompletionsStream(GatewayRequest) returns (stream StreamChunk);

  // Anthropic Messages
  rpc Messages(GatewayRequest) returns (GatewayResponse);
  rpc MessagesStream(GatewayRequest) returns (stream StreamChunk);

  // OpenAI Responses
  rpc Responses(GatewayRequest) returns (GatewayResponse);
  rpc ResponsesStream(GatewayRequest) returns (stream StreamChunk);
}

Message Types

message GatewayRequest {
  string input = 1;  // JSON request body
}

message GatewayResponse {
  int32 status_code = 1;
  bytes body = 2;     // JSON response body
}

message StreamChunk {
  bytes data = 1;     // SSE-formatted chunk data
}

message HealthResponse {
  string status = 1;
  string message = 2;
  string version = 3;
}

Service Discovery

The gRPC server supports reflection, enabling service discovery with tools like grpcurl:
# List all services
grpcurl -plaintext localhost:8789 list

# Describe the Gateway service
grpcurl -plaintext localhost:8789 describe gateway.Gateway

# Describe a specific method
grpcurl -plaintext localhost:8789 describe gateway.Gateway.ChatCompletions

Response Format

The GatewayResponse contains an HTTP status code and a JSON body. The response format depends on which endpoint you called — the gateway ensures consistency regardless of the underlying provider.

ChatCompletions

Returns the standard OpenAI Chat Completions format:
{
  "id": "portkey-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gemini-2.0-flash",
  "provider": "google",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  }
}

Messages (Anthropic)

Returns the Anthropic Messages format — even when the underlying provider is not Anthropic (e.g., calling Messages with @openai/gpt-4o):
{
  "id": "msg_xxx",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "model": "claude-3-opus-20240229",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 14,
    "output_tokens": 10
  }
}

Responses (OpenAI Responses API)

Returns the OpenAI Responses API format:
{
  "id": "resp_xxx",
  "object": "response",
  "created_at": 1234567890,
  "model": "gpt-4o",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Why did the chicken cross the road? To get to the other side!"
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 8,
    "output_tokens": 18,
    "total_tokens": 26
  }
}

Response Metadata

HTTP response headers are returned as gRPC trailing metadata:
x-portkey-trace-id: xxx-xxx-xxx
x-portkey-provider: google
x-portkey-cache-status: DISABLED
x-portkey-retry-attempt-count: 0

Error Handling

Provider and request errors are returned inside the GatewayResponse.status_code field, not as gRPC status codes on the wire. The gRPC call itself will return OK unless there is a gateway infrastructure failure (e.g., server crash, proto parse error). Use the table below to interpret the status_code value in the response:
Status CodeEquivalent gRPC StatusDescription
200OKSuccess
400INVALID_ARGUMENTBad request
401UNAUTHENTICATEDInvalid API key
403PERMISSION_DENIEDAccess denied
404NOT_FOUNDResource not found
429RESOURCE_EXHAUSTEDRate limited
500INTERNALServer error
503UNAVAILABLEService unavailable
504DEADLINE_EXCEEDEDTimeout
Always check GatewayResponse.status_code to detect errors — do not rely on the gRPC call status alone.
Error responses include details in the body:
{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

When to Use gRPC vs HTTP

Use gRPC when…Use HTTP when…
You need the lowest possible latencyYou have browser-based clients (gRPC-Web requires a proxy)
You have high-throughput streaming workloadsYou need simple integrations with widely supported REST
You’re in a service-to-service architecture with protobufYou need easy debugging with standard HTTP tools
You want native gRPC connections to providers like Google Gemini

Connection Management

The gateway maintains a cache of gRPC client connections:
  • Connections are reused per API key for efficiency
  • Stale connections are automatically cleaned up
  • Default timeout: 60 seconds (300 seconds for streaming)

Limitations

The following limitations apply during the beta period:
  • WebSocket-based realtime endpoints (/v1/realtime) are not available via gRPC
  • File upload/download operations use HTTP proxy mode only
  • Batch operations use HTTP proxy mode only
  • Only server-side streaming is supported (no bidirectional streaming)

Troubleshooting

Connection Refused

Error: 14 UNAVAILABLE: Connection refused
Ensure the gRPC server is running with the --llm-grpc flag:
npm start -- --llm-grpc

Authentication Errors

Error: 16 UNAUTHENTICATED
Verify your Portkey API key is correctly passed as gRPC metadata:
grpcurl -H 'x-portkey-api-key: YOUR_KEY' ...

INVALID_ARGUMENT Errors

Error: 3 INVALID_ARGUMENT
Check that your request JSON is valid and properly escaped in the input field. Ensure the model string follows the @provider_slug/model_name format.

Timeout Issues

For long-running requests, increase the client-side timeout:
response = stub.ChatCompletions(
    request,
    metadata=metadata,
    timeout=120.0  # 120 seconds
)
Last modified on February 13, 2026