Enterprise Feature
Batch inference is available on Enterprise plans only. Contact the Portkey team to enable it for your workspace.
Portkey’s AI Gateway lets you send a single request that fan‑outs to hundreds—or millions—of completions. Choose the mode that best fits cost, latency, and provider support.

Choose Your Batching Mode

ModeWhen to pick itWorks with
Provider Batch APICheapest for overnight or offline jobs. Uses the provider’s native batch endpoint & limits.OpenAI, Azure OpenAI, Bedrock, Vertex, Fireworks
Portkey Batch APIFastest and provider‑agnostic. Batches at the Gateway layer; ideal when a provider has no native batch support or you need cross‑provider jobs.Any provider supported by Portkey
Quick rule of thumb →
Need low latency or multi‑provider batching? Portkey Batch API. Otherwise, stick with the provider’s native batch for cost savings.

Before You Start

Have the following ready to start making batch requests:
  1. Portkey account & API key.
  2. Provider credentials for each downstream model (OpenAI key, Bedrock IAM role, etc.).
  3. A Portkey File (input_file_id) - required only when using the Portkey Batch API (Mode #2). See Files to upload one.
  4. Optional: Familiarity with the Create Batch OpenAPI spec.

Provider Batch API Mode

Used to run batch jobs with the provider’s native batch endpoint. Providers usually offer a cheaper rate for batch jobs, but you’ll be limited by the provider’s quota and limits. Most completion windows are about 24 hours.

Quickstart (OpenAI example)

curl -X POST https://api.portkey.ai/v1/batches \
  -H "Authorization: Bearer $PORTKEY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: $@YOUR_PROVIDER_SLUG" \
  -d '{
    "input_file_id": "file_abc123",
    "completion_window": "24h",
    "endpoint": "/v1/chat/completions",
}'
🔗 Full schema: see the OpenAPI reference.

Supported Providers & Endpoints

ProviderEndpoints
OpenAIcompletions, chat completions, embeddings
Azure OpenAIcompletions, chat completions, embeddings
Bedrockchat completions
Vertex AIchat completions, embeddings
Fireworkscompletions

Defaults & Limits

PropertyDefaultNotes
completion_window24hSet by provider (cannot be shorter).
Provider quotaPer providere.g., OpenAI ≤ 50k jobs/day.
RetriesProvider‑definedPortkey surfaces job status; no Gateway retry.

Portkey Batch API Mode ⭐️

How It Works

Set completion_window to immediate and Portkey aggregates your requests in memory, then fires them to the target provider in fixed buckets.
Gateway defaultValue
Batch size25 requests
Batch interval5 s between flushes
Retries3 per request (configurable via x-portkey-config)
Coming soon: configurable batch_size, batch_interval, and max_retries.

Quickstart (provider‑agnostic)

curl -X POST https://api.portkey.ai/v1/batches \
  -H "Authorization: Bearer <PORTKEY_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "pk_file_...",
    "completion_window": "immediate",
    "endpoint": "/v1/chat/completions"
}'
Because Portkey orchestrates the batching, this works even for providers without a native batch endpoint.

Response & Monitoring

Identical to Provider mode; the difference is that provider_job_id is absent and cost is computed from individual calls.

About Portkey Files

Portkey Files are files uploaded to Portkey that are then automatically uploaded to the provider. They’re useful when you want to make multiple batch completions using the same file. Portkey will:
  • Automatically upload the file to the provider on your behalf
  • Reuse the content in your batch requests
  • Check batch progress and provide post-batch analysis including token and cost calculations
  • Make batch outputs available via the GET /batches/<batch_id>/output endpoint

Error Handling & Retries

LayerWhat Portkey doesHow to override
Gateway (Portkey Batch)Retries on network/429/5xxx-portkey-config: {"retry": {"max_attempts": 5}}
Provider (native batch)Provider rulesNot configurable via Portkey

Security & IAM

  • Files are encrypted at rest (AES‑256) and deleted from provider storage once the batch succeeds or after 7 days, whichever is earlier.
  • Portkey uploads on your behalf using least‑privilege scoped credentials; no long‑lived secrets are stored.
  • Access to batch status & outputs is gated by your workspace role (batch.read).

Glossary

TermMeaning
Batch JobA collection of completion requests executed asynchronously.
Portkey File (input_file_id)Files uploaded to Portkey that are automatically uploaded to the provider for batch processing. Useful for reusing the same file across multiple batch completions.
Virtual KeyA logical provider credential stored in Portkey; referenced by ID, not secret.
Completion WindowTime frame in which the job must finish. immediate → handled by Portkey; 24h → delegated to provider.

Roadmap

  • Custom batch_size, batch_interval, max_retries (Q3 2025)
  • Real‑time progress webhooks
  • UI for canceling or pausing jobs