Portkey lets you enforce limits at five independent levels. Every request passes through each applicable check before reaching the provider, so you can layer controls to build exactly the guardrails your organisation needs.Documentation Index
Fetch the complete documentation index at: https://docs.portkey.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Limit Types
| Type | What it measures | Resets |
|---|---|---|
| Budget | Dollar spend | Weekly / monthly / every N days / never |
| Token | Tokens used per request | Weekly / monthly / every N days / never |
| Request count | Number of calls made | Weekly / monthly / every N days / never (policies only) |
| Rate limit | Requests or tokens per time window | Automatically — sliding window |
Level 1 — API Key
An API key represents a team, a service, or an individual. Limits set here apply to everything that key does, regardless of which workspace is involved. Supports: Budget, Token, Rate limits (per minute / hour / day / week)Use Cases
Per-team monthly budget Each team gets their own key with a monthly spend limit. Engineering, Marketing, and Research each stay within their allocation — and Finance gets clean per-team visibility without digging through logs. Contractor or temporary access with a hard cap Issue a key with a fixed lifetime budget and no reset. Once the budget runs out, access stops automatically. No manual revocation needed. Automated pipeline safety net A service account used in test pipelines gets a token rate limit. A runaway test loop won’t quietly run up costs overnight. Get notified before you hit the wall Set an alert threshold at 80% of the budget. The team gets a heads-up before access is blocked, with time to act.Level 2 — Workspace
A workspace groups the people and projects in a part of your organisation. Limits here apply to the combined activity of everyone in that workspace. Supports: Budget, Token, Rate limits (per minute / hour / day / week)Request-count budgets are not available at the workspace level — cost and token budgets only.
Use Cases
Department spend allocation Each department gets its own workspace with a monthly budget. Teams stay within their allocation, and spend rolls up cleanly without any custom reporting. Client project with a fixed budget A client project workspace gets a one-time budget with no reset. When it’s used up, the team knows the project has hit its allocated spend for the engagement. Keep staging costs in check A staging workspace gets a low rate limit so developers can’t accidentally rack up production-scale costs while testing. Token quota for a research team A research workspace gets a monthly token budget. The team lead gets alerted before the quota runs out, with time to request more before work is interrupted.Level 3 — Integration (Provider)
An integration is your connection to a specific provider. Limits set here apply across every workspace using that integration — it’s the most reliable place to enforce a hard ceiling on provider spend. You can also set per-workspace sub-limits within an integration, so each workspace has its own counter while still sharing the integration-level ceiling. Supports: Budget, Token, Rate limits (per minute / hour / day / week)Use Cases
Match your provider contract If you have a monthly commitment with a provider, set your integration budget just below that ceiling. Portkey stops requests before they reach the provider — no surprise invoices. Respect a provider’s rate cap If your deployment has a hard rate limit on the provider side, mirror it on the integration. Portkey rejects excess requests cleanly before they ever hit the provider. Cross-workspace spend cap An integration shared across 10 workspaces gets a single monthly token budget. No combination of workspace activity can push past it. Per-workspace allocations within an integration Two workspaces share the same provider but get different monthly budgets. Each has its own counter; the integration-level ceiling sits above both.
Level 4 — Usage Limit Policies
Policies are rules you define once and apply dynamically to a filtered slice of traffic — without touching individual workspaces or keys. You define two things: conditions (which requests does this policy match?) and group by (does every matching request share one counter, or does each unique value get its own?). Supports: Budget, Token, Request count Resets: Weekly, monthly, every N days, or neverUse Cases
Per-user spend cap without managing individual keys Tag every request with a user identifier in metadata. A single policy gives each user their own independent monthly budget. No key rotation when users join or leave. Per-customer quotas in a multi-tenant product Each customer’s usage is tracked and capped independently. One customer hitting their limit doesn’t affect anyone else. Cap spend on a specific model Set a separate monthly budget scoped to one expensive model. Even if overall spend is within other limits, that model’s cost is controlled separately. Enforce free-tier limits Tag requests by plan type. Free-tier users share no counter with paid users, and their request limit resets monthly automatically. Isolate spend by provider All traffic to a particular provider shares a single monthly budget across all users — regardless of which workspace or key generated the request. Limit a specific prompt template Each user gets their own daily token budget when calling a specific prompt. Other prompts are unaffected. Target production traffic only A policy scoped to a production environment flag leaves development and staging traffic completely untouched.Level 5 — Rate Limit Policies
Same as usage limit policies, but for rate limiting. Conditions and group-by work identically — the difference is that these enforce a requests-per-minute (or hour/day/week) ceiling rather than a cumulative budget. Supports: Rate limits (per minute / hour / day / week) on requests or tokensUse Cases
Per-user rate limiting without individual keys Each user gets their own rate limit from a single policy. No need to issue or manage a separate key per user. Protect an expensive model from traffic spikes A model-scoped policy caps total throughput across all users. No single spike can flood it. Throttle bulk operations separately Embedding or batch-style endpoints are often called in high volumes. Rate limit them independently so they don’t crowd out other traffic. Different rate limits per subscription tier Starter customers get 5 requests per minute; growth customers get 20. Two policies, defined once — updating a customer’s tier just means changing a metadata value. Org-wide provider throughput cap All traffic to a provider shares a single rate limit window, mirroring any throughput agreement you have with them.What Happens When a Limit Is Hit
| Situation | Response | Notes |
|---|---|---|
| Budget, token, or request cap reached | 412 | Blocked immediately. No spend is incurred. Clears after reset or manual action. |
| Rate limit exceeded | 429 | Blocked temporarily. Clears automatically as the time window rolls forward. |
| API key past its expiry date | 401 | Blocked until the key is renewed or replaced. |
Combining Levels
Hard ceiling with per-team sub-limits Set a budget on the integration as an absolute ceiling, then give each workspace a smaller allocation. Teams manage their own spend; the integration limit is the safety net. Organisation-wide cap with per-user rate limits A policy caps total throughput for the whole organisation. A second policy gives each user their own smaller window. Both apply simultaneously. Lifetime budget for an automated workflow An API key with a fixed budget and no reset runs until the budget is gone, then stops. Pair with an alert threshold to know when it’s running low. Free-tier metering at scale Tag every request with user and plan metadata. A single policy enforces per-user monthly limits for free-tier users. Moving a user to a paid plan just means updating their metadata.Next Steps
API Keys
Create and manage API keys with budget and rate controls
Workspaces
Configure workspace-level budgets and access controls
Usage Limit Policies
Set up dynamic limit policies with conditions and group-by
Tracking Costs with Metadata
Attach metadata to requests for per-user and per-feature cost visibility

