Enterprise Feature
Batch inference is available on Enterprise plans only. Contact the Portkey team to enable it for your workspace.
Portkey’s AI Gateway lets you send a single request that fan‑outs to hundreds—or millions—of completions. Choose the mode that best fits cost, latency, and provider support.
Choose Your Batching Mode
Mode | When to pick it | Works with |
---|
Provider Batch API | Cheapest for overnight or offline jobs. Uses the provider’s native batch endpoint & limits. | OpenAI , Azure OpenAI , Bedrock , Vertex , Fireworks |
Portkey Batch API | Fastest and provider‑agnostic. Batches at the Gateway layer; ideal when a provider has no native batch support or you need cross‑provider jobs. | Any provider supported by Portkey |
Quick rule of thumb →
Need low latency or multi‑provider batching? Portkey Batch API. Otherwise, stick with the provider’s native batch for cost savings.
Before You Start
Have the following ready to start making batch requests:
- Portkey account & API key.
- Provider credentials for each downstream model (OpenAI key, Bedrock IAM role, etc.).
- A Portkey File (
input_file_id
) - required only when using the Portkey Batch API (Mode #2). See Files to upload one.
- Optional: Familiarity with the Create Batch OpenAPI spec.
Provider Batch API Mode
Used to run batch jobs with the provider’s native batch endpoint. Providers usually offer a cheaper rate for batch jobs, but you’ll be limited by the provider’s quota and limits. Most completion windows are about 24 hours.
Quickstart (OpenAI example)
curl -X POST https://api.portkey.ai/v1/batches \
-H "Authorization: Bearer $PORTKEY_API_KEY" \
-H "Content-Type: application/json" \
-H "x-portkey-provider: $@YOUR_PROVIDER_SLUG" \
-d '{
"input_file_id": "file_abc123",
"completion_window": "24h",
"endpoint": "/v1/chat/completions",
}'
🔗 Full schema: see the OpenAPI reference.
Supported Providers & Endpoints
Defaults & Limits
Property | Default | Notes |
---|
completion_window | 24h | Set by provider (cannot be shorter). |
Provider quota | Per provider | e.g., OpenAI ≤ 50k jobs/day. |
Retries | Provider‑defined | Portkey surfaces job status; no Gateway retry. |
Portkey Managed Batching
Portkey Managed Batching is a feature that allows you to manage batches across multiple providers with minimal effort and a unified API.
Read more about portkey file here.
How It Works
- Submit a batch request to Portkey with portkey file and provider information.
- Batch requests respect metadata, budgets, and other batch request parameters.
- Portkey will automatically upload and start the batch with provider.
- Portkey will periodically check the batch status and update the batch status in Portkey.
- Once batch is completed, portkey will read the batch output for the following details which will be added to your portkey analytics.
- Token count
- Cost
- Success Request count
- Failed Request count
- Total Request count
Portkey Custom Batching ⭐️
Portkey custom batching is a feature that allows you to batch requests to the provider which doesn’t have a native batch endpoint.
Portkey custom batching is not a discounted rate.
How It Works
Set completion_window
to immediate
and Portkey aggregates your requests in memory, then fires them to the target provider in fixed buckets.
Gateway default | Value |
---|
Batch size | 25 requests |
Batch interval | 5 s between flushes |
Retries | 3 per request (configurable via x-portkey-config ) |
Coming soon: configurable batch_size
, batch_interval
, and max_retries
.
Quickstart (provider‑agnostic)
curl -X POST https://api.portkey.ai/v1/batches \
-H "Authorization: Bearer <PORTKEY_KEY>" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "pk_file_...",
"completion_window": "immediate",
"endpoint": "/v1/chat/completions"
}'
Because Portkey orchestrates the batching, this works even for providers without a native batch endpoint.
Response & Monitoring
Identical to Provider mode; the difference is that provider_job_id
is absent and cost is computed from individual calls.
About Portkey Files
Portkey Files are files uploaded to Portkey that are then automatically uploaded to the provider. They’re useful when you want to make multiple batch completions using the same file. Portkey will:
- Automatically upload the file to the provider on your behalf
- Reuse the content in your batch requests
- Check batch progress and provide post-batch analysis including token and cost calculations
- Make batch outputs available via the
GET /batches/<batch_id>/output
endpoint
Error Handling & Retries
Layer | What Portkey does | How to override |
---|
Gateway (Portkey Batch) | Retries 3× on network/429/5xx | x-portkey-config: {"retry": {"max_attempts": 5}} |
Provider (native batch) | Provider rules | Not configurable via Portkey |
Security & IAM
- Files are encrypted at rest (AES‑256) and deleted from provider storage once the batch succeeds or after 7 days, whichever is earlier.
- Portkey uploads on your behalf using least‑privilege scoped credentials; no long‑lived secrets are stored.
- Access to batch status & outputs is gated by your workspace role (
batch.read
).
Glossary
Term | Meaning |
---|
Batch Job | A collection of completion requests executed asynchronously. |
Portkey File (input_file_id ) | Files uploaded to Portkey that are automatically uploaded to the provider for batch processing. Useful for reusing the same file across multiple batch completions. |
Virtual Key | A logical provider credential stored in Portkey; referenced by ID, not secret. |
Completion Window | Time frame in which the job must finish. immediate → handled by Portkey; 24h → delegated to provider. |
Roadmap
- Custom
batch_size
, batch_interval
, max_retries
(Q3 2025)
- Real‑time progress webhooks
- UI for canceling or pausing jobs