> ## Documentation Index
> Fetch the complete documentation index at: https://docs.portkey.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# LlamaIndex (Python)

> Add Portkey's enterprise features to any LlamaIndex app—observability, reliability, caching, and cost control.

LlamaIndex provides a framework for building LLM applications with your data. Add Portkey to get production-grade features: full observability, automatic fallbacks, semantic caching, and cost controls—all without changing your LlamaIndex code.

## Quick Start

Add Portkey to any LlamaIndex app with 3 parameters:

```python theme={"system"}
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="@openai-prod/gpt-4o",        # Provider slug from Model Catalog
    api_base="https://api.portkey.ai/v1",
    api_key="PORTKEY_API_KEY"           # Your Portkey API key
)

response = llm.complete("Tell me a joke")
print(response.text)
```

<Frame caption="All requests now appear in Portkey logs">
  <img src="https://mintcdn.com/portkey-docs/T0lFtdapIPX8YtCI/images/libraries/langchain-logs.gif?s=83f80a28ac2103950e5683f90faa16b2" width="1612" height="1080" data-path="images/libraries/langchain-logs.gif" />
</Frame>

That's it! You now get:

* ✅ Full observability (costs, latency, logs)
* ✅ Dynamic model selection per request
* ✅ Automatic fallbacks and retries (via configs)
* ✅ Budget controls per team/project

## Why Add Portkey to LlamaIndex?

LlamaIndex handles data indexing and querying. Portkey adds production features:

<CardGroup cols={2}>
  <Card title="Enterprise Observability" icon="chart-line">
    Every request logged with costs, latency, tokens. Team-level analytics and debugging.
  </Card>

  <Card title="Dynamic Model Selection" icon="shuffle">
    Switch models per request. Route simple queries to cheap models, complex to advanced—automatically tracked.
  </Card>

  <Card title="Production Reliability" icon="shield-check">
    Automatic fallbacks, smart retries, load balancing—configured once, works everywhere.
  </Card>

  <Card title="Cost & Access Control" icon="dollar-sign">
    Budget limits per team/project. Rate limiting. Centralized credential management.
  </Card>
</CardGroup>

## Setup

### 1. Install Packages

```bash theme={"system"}
pip install llama-index-llms-openai portkey-ai
```

### 2. Add Provider in Model Catalog

1. Go to [**Model Catalog → Add Provider**](https://app.portkey.ai/model-catalog/providers)
2. Select your provider (OpenAI, Anthropic, Google, etc.)
3. Choose existing credentials or create new by entering your API keys
4. Name your provider (e.g., `openai-prod`)

Your provider slug will be **`@openai-prod`** (or whatever you named it).

<Card title="Complete Model Catalog Guide →" href="/product/model-catalog">
  Set up budgets, rate limits, and manage credentials
</Card>

### 3. Get Portkey API Key

Create your Portkey API key at [app.portkey.ai/api-keys](https://app.portkey.ai/api-keys)

### 4. Use in Your Code

Replace your existing LLM initialization:

```python theme={"system"}
# Before (direct to OpenAI)
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o",
    api_key="OPENAI_API_KEY"
)

# After (via Portkey)
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai/v1",
    api_key="PORTKEY_API_KEY"
)
```

**That's the only change needed!** All your existing LlamaIndex code (indexes, query engines, agents) works exactly the same.

## Switching Between Providers

Just change the model string—everything else stays the same:

```python theme={"system"}
from llama_index.llms.openai import OpenAI

# OpenAI
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai/v1",
    api_key="PORTKEY_API_KEY"
)

# Anthropic
llm = OpenAI(
    model="@anthropic-prod/claude-sonnet-4",
    api_base="https://api.portkey.ai/v1",
    api_key="PORTKEY_API_KEY"
)

# Google Gemini
llm = OpenAI(
    model="@google-prod/gemini-2.0-flash",
    api_base="https://api.portkey.ai/v1",
    api_key="PORTKEY_API_KEY"
)
```

<Note>
  Portkey implements OpenAI-compatible APIs for all providers, so you always use `llama_index.llms.openai.OpenAI` regardless of which model you're calling.
</Note>

## Using with LlamaIndex Chat

LlamaIndex's chat interface works seamlessly:

```python theme={"system"}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage

llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai/v1",
    api_key="PORTKEY_API_KEY"
)

messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What is the capital of France?")
]

response = llm.chat(messages)
print(response.message.content)
```

## Works With All LlamaIndex Features

✅ **Query Engines** - All query types supported\
✅ **Chat Engines** - Conversational interfaces\
✅ **Agents** - Full agent compatibility\
✅ **Streaming** - Token-by-token streaming\
✅ **RAG Pipelines** - Retrieval-augmented generation\
✅ **Workflows** - Complex LLM workflows

### Streaming

```python theme={"system"}
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai/v1",
    api_key="PORTKEY_API_KEY"
)

# Stream completions
for chunk in llm.stream_complete("Write a short story"):
    print(chunk.delta, end="", flush=True)

# Stream chat
messages = [ChatMessage(role="user", content="Tell me a joke")]
for chunk in llm.stream_chat(messages):
    print(chunk.delta, end="", flush=True)
```

### Async Support

```python theme={"system"}
import asyncio
from llama_index.llms.openai import OpenAI

async def main():
    llm = OpenAI(
        model="@openai-prod/gpt-4o",
        api_base="https://api.portkey.ai/v1",
        api_key="PORTKEY_API_KEY"
    )
    
    # Async completion
    response = await llm.acomplete("What is 2+2?")
    print(response.text)

    # Async streaming
    async for chunk in await llm.astream_complete("Write a haiku"):
        print(chunk.delta, end="", flush=True)

asyncio.run(main())
```

### RAG with Query Engine

```python theme={"system"}
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# Set up LLM with Portkey
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai/v1",
    api_key="PORTKEY_API_KEY"
)

# Load and index documents
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Query with Portkey-enabled LLM
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What is the main topic?")
print(response)
```

## Advanced Features via Configs

For production features like fallbacks, caching, and load balancing, use Portkey Configs:

```python theme={"system"}
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

llm = OpenAI(
    model="gpt-4o",  # Default model
    api_base=PORTKEY_GATEWAY_URL,
    api_key="PORTKEY_API_KEY",
    default_headers=createHeaders(
        config="pc_your_config_id"  # Created in Portkey dashboard
    )
)
```

### Example: Fallbacks

```python theme={"system"}
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "strategy": {"mode": "fallback"},
    "targets": [
        {"override_params": {"model": "@openai-prod/gpt-4o"}},
        {"override_params": {"model": "@anthropic-prod/claude-sonnet-4"}}
    ]
}

llm = OpenAI(
    model="gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="PORTKEY_API_KEY",
    default_headers=createHeaders(config=config)
)

# Automatically falls back to Anthropic if OpenAI fails
response = llm.complete("Hello!")
```

### Example: Load Balancing

```python theme={"system"}
config = {
    "strategy": {"mode": "loadbalance"},
    "targets": [
        {"override_params": {"model": "@openai-prod/gpt-4o"}, "weight": 0.5},
        {"override_params": {"model": "@anthropic-prod/claude-sonnet-4"}, "weight": 0.5}
    ]
}

llm = OpenAI(
    model="gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="PORTKEY_API_KEY",
    default_headers=createHeaders(config=config)
)

# Requests distributed 50/50 between OpenAI and Anthropic
response = llm.complete("Hello!")
```

### Example: Caching

```python theme={"system"}
config = {
    "cache": {
        "mode": "semantic",  # or "simple" for exact matches
        "max_age": 3600      # Cache for 1 hour
    },
    "override_params": {"model": "@openai-prod/gpt-4o"}
}

llm = OpenAI(
    model="gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="PORTKEY_API_KEY",
    default_headers=createHeaders(config=config)
)

# Responses cached for similar queries
response = llm.complete("What is machine learning?")
```

<Card title="Learn About Configs →" href="/product/ai-gateway/configs">
  Set up fallbacks, retries, caching, load balancing, and more
</Card>

## Observability

Portkey automatically logs all requests. Add custom metadata for better analytics:

```python theme={"system"}
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="PORTKEY_API_KEY",
    default_headers=createHeaders(
        metadata={
            "_user": "user_123",
            "environment": "production",
            "feature": "rag_query"
        },
        trace_id="unique_trace_id"
    )
)
```

Filter and analyze logs by metadata in the Portkey dashboard.

<Card title="Observability Guide →" href="/product/observability">
  Track costs, performance, and debug issues
</Card>

## Prompt Management

Use prompts from Portkey's Prompt Library:

```python theme={"system"}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey

# Render prompt from Portkey
client = Portkey(api_key="PORTKEY_API_KEY")
prompt_template = client.prompts.render(
    prompt_id="pp-your-prompt-id",
    variables={"topic": "AI"}
).data.dict()

# Use with LlamaIndex
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="PORTKEY_API_KEY"
)

messages = [
    ChatMessage(content=msg["content"], role=msg["role"]) 
    for msg in prompt_template["messages"]
]

response = llm.chat(messages)
print(response.message.content)
```

<Card title="Prompt Library →" href="/product/prompt-library">
  Manage, version, and test prompts in Portkey
</Card>

## Migration from Direct OpenAI

Already using LlamaIndex with OpenAI? Just update 3 parameters:

```python theme={"system"}
# Before
from llama_index.llms.openai import OpenAI
import os

llm = OpenAI(
    model="gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY"),
    temperature=0.7
)

# After (add 2 parameters, change 1)
llm = OpenAI(
    model="@openai-prod/gpt-4o",          # Add provider slug
    api_base="https://api.portkey.ai/v1",     # Add this
    api_key="PORTKEY_API_KEY",             # Change to Portkey key
    temperature=0.7                         # Keep existing params
)
```

**Benefits:**

* Zero code changes to your existing LlamaIndex logic
* Instant observability for all requests
* Production-grade reliability features
* Cost controls and budgets

## Next Steps

<CardGroup cols={2}>
  <Card title="Model Catalog" icon="database" href="/product/model-catalog">
    Set up providers, budgets, and access control
  </Card>

  <Card title="Configs" icon="gear" href="/product/ai-gateway/configs">
    Configure fallbacks, caching, and routing
  </Card>

  <Card title="Observability" icon="chart-line" href="/product/observability">
    Track costs, performance, and usage
  </Card>

  <Card title="Guardrails" icon="shield" href="/product/guardrails">
    Add PII detection and content filtering
  </Card>
</CardGroup>

For complete SDK documentation:

<Card title="SDK Reference" icon="code" href="/api-reference/sdk/list">
  Complete Portkey SDK documentation
</Card>
