Pattern: Conditional → Load BalancerUse conditional routing to match a model alias, then send that alias to a load balancer spread across multiple providers. Traffic for claude-sonnet distributes evenly across Anthropic, Vertex AI, and Bedrock — each with independent rate limit buckets, effectively tripling throughput.
Why this matters: Each provider’s rate limit is independent. Spreading across three triples available throughput with no code changes — the app sends model: "claude-sonnet" and Portkey handles the rest.
Pattern: Conditional → FallbackEach conditional branch points to its own independent fallback chain. When claude-sonnet is requested, Portkey tries Anthropic first, then Vertex AI, then Bedrock — in order. When gpt-4o is requested, it tries OpenAI first, then Azure. The two chains are completely isolated: an OpenAI outage has no effect on Claude routing.
on_status_codes controls when a fallback triggers. If the primary returns a 400 (bad request) but your list only includes [429, 500, 502, 503, 504], the fallback will not activate — the error is returned to the caller immediately. Tune this list based on which errors you consider recoverable.
Why this matters: A single flat fallback chain shares across all model types. Per-branch fallbacks give each model family its own dedicated recovery sequence — with independent on_status_codes, retry configuration, and provider ordering.
Pattern: Fallback → Conditional RouterThe fallback target doesn’t have to be a static model — it can be a conditional router that picks the best available backup based on request context. This is useful for compliance and data-residency requirements: if the primary fails, EU users automatically route to an EU-hosted backup while others get a US backup.For this pattern to work, the application must pass the routing dimension in the request metadata. The conditional router reads it via the metadata.* query path:
Why this matters: A static fallback chain treats all requests the same when the primary fails. A conditional fallback makes the backup as smart as the primary — EU users always land on EU infrastructure, even in a failure scenario.
Pattern: Fallback → Load BalancerThe primary target is a load balancer across multiple providers. Individual provider failures are handled by the load balancer — traffic redistributes within the cluster. Only when all providers in the cluster fail does the outer fallback activate. This avoids over-triggering cross-model fallbacks while still guaranteeing zero downtime.
Why this matters: Without this pattern, any single Gemini endpoint failure triggers a model switch to GPT-4.1. With the load balancer as primary, a single failure just redistributes within Gemini — GPT-4.1 only activates when the entire Gemini cluster is down.
Without on_status_codes, any non-2xx response triggers the fallback — including 400 and 403 errors. To limit fallback to specific recoverable errors only, set on_status_codes explicitly: "strategy": { "mode": "fallback", "on_status_codes": [429, 500, 502, 503, 504] }. With that list set, a 400 or 403 will not activate the fallback and the error is returned to the caller immediately.
Pattern: Load Balancer → Fallback (per slot)Each load-balanced slot is itself a fallback chain. Traffic distributes across two model families (OpenAI and Anthropic), and each family has its own independent backup. An OpenAI outage triggers the Azure fallback for that leg only — Anthropic traffic is unaffected.
Why this matters: A top-level fallback on a load balancer means any failure sends all traffic to the backup. Per-leg fallbacks give each model family its own safety net — an OpenAI issue doesn’t affect Anthropic routing at all.
Every request is logged with its full routing path. In Portkey Logs:
Filter by Config ID to see all requests through this config
Filter by Trace ID to see every attempt for a single request — which load-balanced target was selected, whether a fallback triggered, which conditional branch matched
The model field shows the actual provider model used (not the alias)