> ## Documentation Index
> Fetch the complete documentation index at: https://docs.portkey.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Load Balancing

> Distribute traffic across multiple LLMs for high availability and optimal performance.

<Info>
  Available on all Portkey [plans](https://portkey.ai/pricing).
</Info>

Distribute traffic across multiple LLMs to prevent any single provider from becoming a bottleneck.

## Examples

<CodeGroup>
  ```json Between Providers (model from request) theme={"system"}
  {
    "strategy": { "mode": "loadbalance" },
    "targets": [
      { "provider": "@openai-prod", "weight": 0.7 },
      { "provider": "@azure-prod", "weight": 0.3 }
    ]
  }
  ```

  ```json Between Models theme={"system"}
  {
    "strategy": { "mode": "loadbalance" },
    "targets": [
      { "override_params": { "model": "@openai-prod/gpt-4o" }, "weight": 0.75 },
      { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" }, "weight": 0.25 }
    ]
  }
  ```

  ```json Multiple API Keys (same provider) theme={"system"}
  {
    "strategy": { "mode": "loadbalance" },
    "targets": [
      { "provider": "@openai-key-1", "weight": 1 },
      { "provider": "@openai-key-2", "weight": 1 },
      { "provider": "@openai-key-3", "weight": 1 }
    ]
  }
  ```

  ```json Cost Optimization (cheap vs premium) theme={"system"}
  {
    "strategy": { "mode": "loadbalance" },
    "targets": [
      { "override_params": { "model": "@openai-prod/gpt-4o-mini" }, "weight": 0.8 },
      { "override_params": { "model": "@openai-prod/gpt-4o" }, "weight": 0.2 }
    ]
  }
  ```

  ```json Gradual Migration (old to new model) theme={"system"}
  {
    "strategy": { "mode": "loadbalance" },
    "targets": [
      { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" }, "weight": 0.9 },
      { "override_params": { "model": "@anthropic-prod/claude-sonnet-4-20250514" }, "weight": 0.1 }
    ]
  }
  ```
</CodeGroup>

| Pattern               | Use Case                                                           |
| --------------------- | ------------------------------------------------------------------ |
| **Between Providers** | Route to different providers; model comes from request             |
| **Multiple API Keys** | Distribute load across rate limits from different accounts         |
| **Cost Optimization** | Send most traffic to cheaper models, reserve premium for a portion |
| **Gradual Migration** | Test new models with small percentage before full rollout          |

<Info>
  The `@provider-slug/model-name` format automatically routes to the correct provider. Set up providers in [Model Catalog](https://app.portkey.ai/model-catalog).
</Info>

[Create](/product/ai-gateway/configs#creating-configs) and [use](/product/ai-gateway/configs#using-configs) configs in your requests.

## How It Works

1. **Define targets & weights** — Assign a `weight` to each target. Weights represent relative share of traffic.
2. **Weight normalization** — Portkey normalizes weights to sum to 100%. Example: weights 5, 3, 1 become 55%, 33%, 11%.
3. **Request distribution** — Each request routes to a target based on normalized probabilities.

<Info>
  * Default `weight`: `1`
  * Minimum `weight`: `0` (stops traffic without removing from config)
  * Unset weights default to `1`
</Info>

## Considerations

* Ensure LLMs in your list are compatible with your use case
* Monitor usage per LLM—weight distribution affects spend
* Each LLM has different latency and pricing

## Sticky Load Balancing

Sticky load balancing ensures that requests with the same identifier are consistently routed to the same target. This is useful for:

* Maintaining conversation context across multiple requests
* Ensuring consistent model behavior for A/B testing
* Session-based routing for user-specific experiences

### Configuration

Add `sticky` to your load balancing strategy:

```json theme={"system"}
{
  "strategy": {
    "mode": "loadbalance",
    "sticky": {
      "enabled": true,
      "hash_fields": ["metadata.user_id"],
      "ttl": 3600
    }
  },
  "targets": [
    {
      "provider": "@openai-prod",
      "weight": 0.5
    },
    {
      "provider": "@anthropic-prod",
      "weight": 0.5
    }
  ]
}
```

### Parameters

| Parameter     | Type    | Description                                                                                                                                                               |
| ------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `enabled`     | boolean | Turns sticky routing on or off. Must be `true` to activate sticky behavior. If omitted or `false`, sticky routing is disabled and normal weighted load balancing is used. |
| `hash_fields` | array   | Fields to use for generating the sticky session identifier. Supports dot notation for nested fields (e.g., `metadata.user_id`, `metadata.session_id`)                     |
| `ttl`         | number  | Time-to-live in seconds for the sticky session. After this period, a new target may be selected. Default: 3600 (1 hour)                                                   |

### How It Works

1. **Identifier Generation**: When a request arrives, Portkey generates a hash from the specified `hash_fields` values
2. **Target Lookup**: The hash is used to look up the previously assigned target from cache
3. **Consistent Routing**: If a cached assignment exists and hasn't expired, the request goes to the same target
4. **New Assignment**: If no cached assignment exists, a new target is selected based on weights and cached for future requests

<Note>
  Sticky sessions use a two-tier cache system (in-memory + Redis) for fast lookups and persistence across gateway instances in distributed deployments.
</Note>

## Combine with Other Strategies

Load balancing is composable with conditional routing and fallbacks — use it as a target inside another strategy, or nest other strategies inside each load-balanced slot.

<CardGroup cols={2}>
  <Card title="Scale One Model Across Multiple Providers" icon="code-branch" href="/guides/use-cases/combining-routing-strategies#scale-one-model-across-multiple-providers">
    Use a conditional router to match a model alias, then send that branch to a load balancer across Anthropic, Vertex AI, and Bedrock.
  </Card>

  <Card title="Fallback When the Whole Cluster Goes Down" icon="shield" href="/guides/use-cases/combining-routing-strategies#fallback-when-the-whole-cluster-goes-down">
    Wrap a load balancer cluster in a fallback — individual failures stay within the cluster; only a full outage triggers the cross-model backup.
  </Card>

  <Card title="Isolate Failures Between Model Families" icon="layer-group" href="/guides/use-cases/combining-routing-strategies#isolate-failures-between-model-families">
    Give each load-balanced slot its own independent fallback. An OpenAI outage only affects that slot — the other models route normally.
  </Card>

  <Card title="All Patterns Together" icon="diagram-project" href="/guides/use-cases/combining-routing-strategies#the-full-config">
    See a complete config combining conditional routing, load balancing, and fallbacks across four model families.
  </Card>
</CardGroup>

## Caveats and Considerations

While the Load Balancing feature offers numerous benefits, there are a few things to consider:

1. Ensure the LLMs in your list are compatible with your use case. Not all LLMs offer the same capabilities or respond in the same format.
2. Be aware of your usage with each LLM. Depending on your weight distribution, your usage with each LLM could vary significantly.
3. Keep in mind that each LLM has its own latency and pricing. Diversifying your traffic could have implications on the cost and response time.
4. **Sticky sessions** require Redis for persistence across gateway instances. Without Redis, sticky sessions will only work within a single gateway instance's memory.
