tts-1-hd from OpenAI, you can not send more than 7 requests in minute. Any extra request automatically fails.
There are many real-world use cases where it’s possible to run into rate limits:
- When your requests have very high input-tokens count or a very long context, you can hit token thresholds
- When you are running a complex and long prompts pipeline that fires hundreds of requests at once, you can hit both token & request limits
Here’s an overview of rate limits imposed by various providers:
| LLM Provider | Example Model | Rate Limits |
|---|---|---|
| OpenAI | gpt-5 | Tier 1:500 Requests per Minute10,000 Tokens per Minute10,000 Requests per Day |
| Anthropic | All models | Tier 1:50 RPM50,000 TPM1 Million Tokens per Day |
| Cohere | Co.Generate models | Production Key:10,000 RPM |
| Anyscale | All models | Endpoints:30 concurrent requests |
| Perplexity AI | mixtral-8x7b-instruct | 24 RPM16,000 TPM |
| Together AI | All models | Paid:100 RPM |
1. Install Portkey SDK
chat.completions call using the Portkey SDK:
@openai-prod.
To ensure your request doesn’t get rate limited, we’ll utilise Portkey’s fallback & loadbalance features:
2. Fallback to Alternative LLMs
With Portkey, you can write a call routing strategy that helps you fallback from one provider to another provider in case of rate limit errors. This is done by passing a Config object while instantiating your Portkey client:In this Config object,
- The routing
strategyis set asfallback on_status_codesparam ensures that the fallback is only triggered on the429error code, which is generated for rate limit errorstargetsarray contains the model slugs from Model Catalog and the order of the fallback- The
override_paramsin the second target lets you add more params for the specific provider. (max_tokensfor Anthropic in this case)
3. Load Balance Among Multiple LLMs
Instead of sending all your requests to a single provider on a single account, you can split your traffic across multiple provider accounts using Portkey - this ensures that a single account does not get overburdened with requests and thus avoids rate limits. It is very easy to setup this “loadbalancing” using Portkey - just write the relevant loadbalance Config and pass it while instantiating your Portkey client once:In this Config object:
- The routing
strategyis set asloadbalance targetscontain 3 different OpenAI provider slugs from Model Catalog (representing 3 different accounts), all with equal weight - which means Portkey will split the traffic 1/3rd equally among the 3

