Enterprise Feature
Batch inference is available on Enterprise hybrid and self-hosted plans only. Contact the Portkey team to enable it for your Organization.
Batch inference is available on Enterprise hybrid and self-hosted plans only. Contact the Portkey team to enable it for your Organization.
Choose Your Batching Mode
| Mode | When to pick it | Works with |
|---|---|---|
| Provider Batch API | Cheapest for overnight or offline jobs. Uses the provider’s native batch endpoint & limits. | OpenAI, Azure OpenAI, Bedrock, Vertex |
| Portkey Batch API | Fastest and provider‑agnostic. Batches at the Gateway layer; ideal when a provider has no native batch support or you need cross‑provider jobs. | Any provider supported by Portkey Gateway |
Quick rule of thumb →
Need low latency or multi‑provider batching? Portkey Batch API. Otherwise, stick with the provider’s native batch for cost savings.
Need low latency or multi‑provider batching? Portkey Batch API. Otherwise, stick with the provider’s native batch for cost savings.
Before You Start
Have the following ready to start making batch requests:- Portkey account & API key.
- Data Service to be enabled.
- Provider credentials for each downstream model (OpenAI key, Bedrock IAM role, etc.).
- A Portkey File (
input_file_id) - required only when using the Portkey Batch API (Mode #2). See Files to upload one. - Optional: Familiarity with the Create Batch OpenAPI spec.
Provider Batch API Mode
Used to run batch jobs with the provider’s native batch endpoint. Providers usually offer a cheaper rate for batch jobs, but you’ll be limited by the provider’s quota and limits. Most completion windows are about 24 hours.Polling for batch status: Portkey’s gateway is stateless and does not poll for completion status of batches on the provider side. You must poll the batch status manually using the unified API with the same signature for all supported providers. See Retrieve Batch for details.
Quickstart (OpenAI example)
🔗 Full schema: see the OpenAPI reference.
Supported Providers & Endpoints
| Provider | Endpoints |
|---|---|
| OpenAI | completions, chat completions, embeddings |
| Azure OpenAI | completions, chat completions, embeddings |
| Bedrock | chat completions |
| Vertex AI | chat completions, embeddings |
Defaults & Limits
| Property | Default | Notes |
|---|---|---|
completion_window | 24h | Set by provider (cannot be shorter). |
| Provider quota | Per provider | e.g., OpenAI ≤ 50k jobs/day. |
| Retries | Provider‑defined | Portkey surfaces job status; no Gateway retry. |
Portkey Managed Batching
Portkey Managed Batching is a feature that allows you to manage batches across multiple providers with minimal effort and a unified API. Read more about portkey file here.How It Works
- Submit a batch request to Portkey with portkey file and provider information.
- Batch requests respect metadata, budgets, and other batch request parameters.
- Portkey will automatically upload and start the batch with provider.
- Portkey will periodically check the batch status and update the batch status in Portkey.
- Once batch is completed, portkey will read the batch output for the following details which will be added to your portkey analytics.
- Token count
- Cost
- Success Request count
- Failed Request count
- Total Request count
Note: The automatic status polling and analytics described above apply only to Portkey Managed Batching. For Provider Batch API Mode (Unified Batch Inference), you must poll the batch status manually.
Portkey Custom Batching ⭐️
Portkey custom batching is a feature that allows you to batch requests to the provider which doesn’t have a native batch endpoint.Portkey custom batching is not a discounted rate.
How It Works
Setcompletion_window to immediate and Portkey aggregates your requests in memory, then fires them to the target provider in fixed buckets.
| Gateway default | Value |
|---|---|
| Batch size | 25 requests |
| Batch interval | 5 s between flushes |
| Retries | 3 per request (configurable via x-portkey-config) |
batch_size, batch_interval, and max_retries.
Quickstart (provider‑agnostic)
Response & Monitoring
Identical to Provider mode; the difference is thatprovider_job_id is absent and cost is computed from individual calls.
About Portkey Files
Portkey Files are files uploaded to Portkey that are then automatically uploaded to the provider. They’re useful when you want to make multiple batch completions using the same file. Portkey will:- Automatically upload the file to the provider on your behalf
- Reuse the content in your batch requests
- Check batch progress and provide post-batch analysis including token and cost calculations
- Make batch outputs available via the
GET /batches/<batch_id>/outputendpoint
Error Handling & Retries
| Layer | What Portkey does | How to override |
|---|---|---|
| Gateway (Portkey Batch) | Retries 3× on network/429/5xx | x-portkey-config: {"retry": {"max_attempts": 5}} |
| Provider (native batch) | Provider rules | Not configurable via Portkey |
Security & IAM
- Files are encrypted at rest (AES‑256) and custom encryption key is supported, if required.
- Portkey uploads on your behalf using least‑privilege scoped credentials; no long‑lived secrets are stored.
- Access to batch status & outputs is gated by your workspace role (
completions.write).
Glossary
| Term | Meaning |
|---|---|
| Batch Job | A collection of completion requests executed asynchronously. |
Portkey File (input_file_id) | Files uploaded to Portkey that are automatically uploaded to the provider for batch processing. Useful for reusing the same file across multiple batch completions. |
| Virtual Key | A logical provider credential stored in Portkey; referenced by ID, not secret. |
| Completion Window | Time frame in which the job must finish. immediate → handled by Portkey; 24h → delegated to provider. |
Roadmap
- Custom
batch_size,batch_interval,max_retries(Q3 2025) - Real‑time progress webhooks
- UI for canceling or pausing jobs

