Companies often face challenges of scaling their services efficiently as the traffic to their applications grow - when you’re consuming APIs, the first point of failure is that if you hit the API too much, you can get rate limited. Loadbalancing is a proven way to scale usage horizontally without overburdening any one provider and thus staying within rate limits.
portkey-ai
to your NodeJS project.
loadbalance
strategy across Anthropic and OpenAI. weight
describes the traffic should be split into 50/50 among both the LLM providers while override_params
will help us override the defaults.
Let’s take this a step further to apply a fallback mechanism for the requests from* OpenAI* to fallback to Azure OpenAI. This nested mechanism among the targets
will ensure our app is reliable in the production in great confidence.
See the documentation for Portkey Fallbacks and Loadbalancing.
config
‘s are concrete and are passed as arguments when instantiating the Portkey client instance, all subsequent will acquire desired behavior auto-magically — No additional changes to the codebase.
request-loadbalance-fallback
.
weight
, indication of traffic is split to have 10% of your user-base are served from Anyscale’s Llama models. Now, you are all set up to get feedback and observe the performance of your app and release increasingly to larger userbase.
max_tokens
is required for Anthropic and not for OpenAI.See the entire code