Skip to main content
Portkey’s three routing strategies are fully interoperable. Any target in any strategy can itself contain another strategy:
Outer strategyInner strategy (as a target)What it achieves
ConditionalLoad BalancerRoute by model, then distribute within each model across providers
ConditionalFallbackRoute by model, with a safety chain per branch
FallbackConditional RouterSmart fallback — pick the backup based on request context, not a static model
FallbackLoad BalancerProtect a distributed cluster with a cross-provider safety net
Load BalancerFallbackEach load-balanced slot has its own independent failover
Load BalancerConditionalEach distribution slot picks a model based on request metadata
This guide shows five real-world patterns with complete configs.

Scale One Model Across Multiple Providers

Pattern: Conditional → Load Balancer Use conditional routing to match a model alias, then send that alias to a load balancer spread across multiple providers. Traffic for claude-sonnet distributes evenly across Anthropic, Vertex AI, and Bedrock — each with independent rate limit buckets, effectively tripling throughput.
{
  "strategy": {
    "mode": "conditional",
    "conditions": [
      { "query": { "params.model": { "$eq": "claude-sonnet" } }, "then": "claude-sonnet-lb" },
      { "query": { "params.model": { "$eq": "gpt-4o" } }, "then": "gpt-4o-direct" }
    ],
    "default": "gpt-4o-direct"
  },
  "targets": [
    {
      "name": "claude-sonnet-lb",
      "strategy": { "mode": "loadbalance" },
      "targets": [
        { "override_params": { "model": "@anthropic/claude-sonnet-4-5-20250514" }, "weight": 1 },
        { "override_params": { "model": "@vertex/claude-sonnet-4-5@20250514" }, "weight": 1 },
        { "override_params": { "model": "@bedrock/anthropic.claude-sonnet-4-5-20250514-v1:0" }, "weight": 1 }
      ]
    },
    {
      "name": "gpt-4o-direct",
      "override_params": { "model": "@openai/gpt-4o" }
    }
  ]
}
Why this matters: Each provider’s rate limit is independent. Spreading across three triples available throughput with no code changes — the app sends model: "claude-sonnet" and Portkey handles the rest.

Give Each Model Its Own Fallback

Pattern: Conditional → Fallback Each conditional branch points to its own independent fallback chain. When claude-sonnet is requested, Portkey tries Anthropic first, then Vertex AI, then Bedrock — in order. When gpt-4o is requested, it tries OpenAI first, then Azure. The two chains are completely isolated: an OpenAI outage has no effect on Claude routing.
{
  "strategy": {
    "mode": "conditional",
    "conditions": [
      { "query": { "params.model": { "$eq": "claude-sonnet" } }, "then": "claude-with-fallback" },
      { "query": { "params.model": { "$eq": "gpt-4o" } }, "then": "gpt4o-with-fallback" }
    ],
    "default": "gpt4o-with-fallback"
  },
  "targets": [
    {
      "name": "claude-with-fallback",
      "strategy": {
        "mode": "fallback",
        "on_status_codes": [429, 500, 502, 503, 504]
      },
      "targets": [
        { "override_params": { "model": "@anthropic/claude-sonnet-4-5-20250514" } },
        { "override_params": { "model": "@vertex/claude-sonnet-4-5@20250514" } },
        { "override_params": { "model": "@bedrock/anthropic.claude-sonnet-4-5-20250514-v1:0" } }
      ]
    },
    {
      "name": "gpt4o-with-fallback",
      "strategy": {
        "mode": "fallback",
        "on_status_codes": [429, 500, 502, 503, 504]
      },
      "targets": [
        { "override_params": { "model": "@openai/gpt-4o" } },
        { "override_params": { "model": "@azure/gpt-4o" } }
      ]
    }
  ]
}
on_status_codes controls when a fallback triggers. If the primary returns a 400 (bad request) but your list only includes [429, 500, 502, 503, 504], the fallback will not activate — the error is returned to the caller immediately. Tune this list based on which errors you consider recoverable.
Why this matters: A single flat fallback chain shares across all model types. Per-branch fallbacks give each model family its own dedicated recovery sequence — with independent on_status_codes, retry configuration, and provider ordering.

Smart Failover by Request Context

Pattern: Fallback → Conditional Router The fallback target doesn’t have to be a static model — it can be a conditional router that picks the best available backup based on request context. This is useful for compliance and data-residency requirements: if the primary fails, EU users automatically route to an EU-hosted backup while others get a US backup. For this pattern to work, the application must pass the routing dimension in the request metadata. The conditional router reads it via the metadata.* query path:
{
  "strategy": { "mode": "fallback" },
  "targets": [
    {
      "override_params": { "model": "@openai/gpt-4o" }
    },
    {
      "strategy": {
        "mode": "conditional",
        "conditions": [
          { "query": { "metadata.user_region": { "$eq": "EU" } }, "then": "eu-backup" }
        ],
        "default": "us-backup"
      },
      "targets": [
        { "name": "eu-backup", "override_params": { "model": "@azure-eu/gpt-4o" } },
        { "name": "us-backup", "override_params": { "model": "@azure-us/gpt-4o" } }
      ]
    }
  ]
}
The application passes user_region via the x-portkey-metadata header (or the metadata SDK parameter):
response = client.with_options(
    metadata={"user_region": "EU"}
).chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}]
)
Why this matters: A static fallback chain treats all requests the same when the primary fails. A conditional fallback makes the backup as smart as the primary — EU users always land on EU infrastructure, even in a failure scenario.

Fallback When the Whole Cluster Goes Down

Pattern: Fallback → Load Balancer The primary target is a load balancer across multiple providers. Individual provider failures are handled by the load balancer — traffic redistributes within the cluster. Only when all providers in the cluster fail does the outer fallback activate. This avoids over-triggering cross-model fallbacks while still guaranteeing zero downtime.
{
  "strategy": { "mode": "fallback" },
  "targets": [
    {
      "strategy": { "mode": "loadbalance" },
      "targets": [
        { "override_params": { "model": "@vertex/gemini-2.5-pro" }, "weight": 1 },
        { "override_params": { "model": "@google-1/gemini-2.5-pro" }, "weight": 1 },
        { "override_params": { "model": "@google-2/gemini-2.5-pro" }, "weight": 1 }
      ]
    },
    { "override_params": { "model": "@openai/gpt-4.1" } }
  ]
}
Why this matters: Without this pattern, any single Gemini endpoint failure triggers a model switch to GPT-4.1. With the load balancer as primary, a single failure just redistributes within Gemini — GPT-4.1 only activates when the entire Gemini cluster is down.
Without on_status_codes, any non-2xx response triggers the fallback — including 400 and 403 errors. To limit fallback to specific recoverable errors only, set on_status_codes explicitly: "strategy": { "mode": "fallback", "on_status_codes": [429, 500, 502, 503, 504] }. With that list set, a 400 or 403 will not activate the fallback and the error is returned to the caller immediately.

Isolate Failures Between Model Families

Pattern: Load Balancer → Fallback (per slot) Each load-balanced slot is itself a fallback chain. Traffic distributes across two model families (OpenAI and Anthropic), and each family has its own independent backup. An OpenAI outage triggers the Azure fallback for that leg only — Anthropic traffic is unaffected.
{
  "strategy": { "mode": "loadbalance" },
  "targets": [
    {
      "strategy": { "mode": "fallback" },
      "targets": [
        { "override_params": { "model": "@openai/gpt-4o" } },
        { "override_params": { "model": "@azure/gpt-4o" } }
      ],
      "weight": 1
    },
    {
      "strategy": { "mode": "fallback" },
      "targets": [
        { "override_params": { "model": "@anthropic/claude-sonnet-4-5-20250514" } },
        { "override_params": { "model": "@bedrock/anthropic.claude-sonnet-4-5-20250514-v1:0" } }
      ],
      "weight": 1
    }
  ]
}
Why this matters: A top-level fallback on a load balancer means any failure sends all traffic to the backup. Per-leg fallbacks give each model family its own safety net — an OpenAI issue doesn’t affect Anthropic routing at all.

The Full Config

All four patterns combined: a conditional router with four model aliases, each targeting a different strategy composition.
{
  "strategy": {
    "mode": "conditional",
    "conditions": [
      { "query": { "params.model": { "$eq": "claude-sonnet" } }, "then": "claude-sonnet-lb" },
      { "query": { "params.model": { "$eq": "gpt-4o" } }, "then": "gpt-4o-target" },
      { "query": { "params.model": { "$eq": "gpt-4o-mini" } }, "then": "gpt-4o-mini-lb" },
      { "query": { "params.model": { "$eq": "gemini-2.5-pro" } }, "then": "gemini-lb-with-fallback" }
    ],
    "default": "gpt-4o-target"
  },
  "targets": [
    {
      "name": "claude-sonnet-lb",
      "strategy": { "mode": "loadbalance" },
      "targets": [
        { "override_params": { "model": "@anthropic/claude-sonnet-4-5-20250514" }, "weight": 1 },
        { "override_params": { "model": "@vertex/claude-sonnet-4-5@20250514" }, "weight": 1 },
        { "override_params": { "model": "@bedrock/anthropic.claude-sonnet-4-5-20250514-v1:0" }, "weight": 1 }
      ]
    },
    {
      "name": "gpt-4o-target",
      "override_params": { "model": "@openai/gpt-4o" }
    },
    {
      "name": "gpt-4o-mini-lb",
      "strategy": { "mode": "loadbalance" },
      "targets": [
        { "override_params": { "model": "@azure/gpt-4o-mini" }, "weight": 1 },
        { "override_params": { "model": "@openai-1/gpt-4o-mini" }, "weight": 1 },
        { "override_params": { "model": "@openai-2/gpt-4o-mini" }, "weight": 1 }
      ]
    },
    {
      "name": "gemini-lb-with-fallback",
      "strategy": { "mode": "fallback" },
      "targets": [
        {
          "strategy": { "mode": "loadbalance" },
          "targets": [
            { "override_params": { "model": "@vertex/gemini-2.5-pro" }, "weight": 1 },
            { "override_params": { "model": "@google-1/gemini-2.5-pro" }, "weight": 1 },
            { "override_params": { "model": "@google-2/gemini-2.5-pro" }, "weight": 1 }
          ]
        },
        { "override_params": { "model": "@openai/gpt-4.1" } }
      ]
    }
  ]
}
Save this in the Portkey UI and copy the resulting Config ID.

Using the Config

from portkey_ai import Portkey

client = Portkey(
    api_key="PORTKEY_API_KEY",
    config="pc-multi-routing-xxxxx"
)

# Conditional → LB: routes to claude-sonnet-lb (Anthropic + Vertex + Bedrock)
response = client.chat.completions.create(
    model="claude-sonnet",
    messages=[{"role": "user", "content": "Explain transformer architecture"}]
)

# Conditional → direct: routes to gpt-4o-target
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a unit test for this function"}]
)

# Conditional → LB: routes to gpt-4o-mini-lb (Azure + 2× OpenAI)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Classify this support ticket"}]
)

# Conditional → Fallback(LB): routes to gemini-lb-with-fallback
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Analyze this 100k-token document"}]
)

Setting Up AI Providers

Add each provider in the Model Catalog and assign it a slug. The slug becomes the @provider-slug prefix in model strings.
Slug used in configProviderNotes
@anthropicAnthropicDirect API
@vertexGoogle Vertex AIRequires GCP credentials
@bedrockAWS BedrockRequires AWS credentials
@openaiOpenAIPrimary account
@openai-1OpenAISecond account (rate limit headroom)
@openai-2OpenAIThird account (rate limit headroom)
@azureAzure OpenAIRequires Azure deployment
@azure-euAzure OpenAI (EU region)For data-residency compliance
@azure-usAzure OpenAI (US region)For data-residency compliance
@google-1Google AI StudioFirst account
@google-2Google AI StudioSecond account (rate limit headroom)
See Model Catalog for the full setup guide.

Observability

Every request is logged with its full routing path. In Portkey Logs:
  • Filter by Config ID to see all requests through this config
  • Filter by Trace ID to see every attempt for a single request — which load-balanced target was selected, whether a fallback triggered, which conditional branch matched
  • The model field shows the actual provider model used (not the alias)
Add a trace_id for programmatic tracing:
response = client.with_options(
    trace_id="user-req-20250514-abc123"
).chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Summarize this document"}]
)

When to Use Each Pattern

PatternBest for
Scale One Model Across Multiple ProvidersHigh-volume aliases hitting rate limits on a single provider
Give Each Model Its Own FallbackDifferent model families that each need an independent recovery sequence
Smart Failover by Request ContextCompliance or data-residency requirements that must hold even during outages
Fallback When the Whole Cluster Goes DownHigh-throughput clusters where individual endpoint failures should not trigger a model switch
Isolate Failures Between Model FamiliesMulti-model load distribution where one family’s outage must not affect others
Last modified on February 25, 2026