Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.portkey.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Prompt caching on Amazon Bedrock lets you cache specific portions of your requests for repeated use. This feature significantly reduces inference response latency and input token costs by allowing the model to skip recomputation of previously processed content. With Portkey, you can easily implement Amazon Bedrock’s prompt caching through our OpenAI-compliant unified API and prompt templates.

Model Support

Amazon Bedrock prompt caching is generally available with the following models:
Currently Supported Models:
  • Claude Opus 4.6
  • Claude Opus 4.5
  • Claude Opus 4
  • Claude Sonnet 4.6
  • Claude Sonnet 4.5
  • Claude Haiku 4.5
  • Claude 3.5 Haiku
  • Claude 3.7 Sonnet
  • Claude 3.5 Sonnet v2 (Preview)
  • Amazon Nova Micro, Lite, Pro (automatic caching)

How Bedrock Prompt Caching Works

When using prompt caching, you define cache checkpoints - markers that indicate parts of your prompt to cache. These cached sections must be static between requests; any alterations will result in a cache miss.
You can also use Bedrock Prompt Caching Feature with Portkey’s Prompt Templates.

Implementation Examples

Here’s how to implement prompt caching with Portkey:
import Portkey from 'portkey-ai'

const portkey = new Portkey({
    apiKey: "PORTKEY_API_KEY", // defaults to process.env["PORTKEY_API_KEY"]
    provider:"@PROVIDER" // Your Bedrock Provider Slug
})

const chatCompletion = await portkey.chat.completions.create({
    messages: [
        { "role": 'system', "content": [
            {
                "type":"text","text":"You are a helpful assistant"
            },
            {
                "type":"text","text":"This is a large document I want to cache...",
                "cache_control": {"type": "ephemeral"}
            }
        ]},
        { "role": 'user', "content": 'Summarize the above document for me in 20 words' }
    ],
    model: 'anthropic.claude-3-7-sonnet-20250219-v1:0'
});

console.log(chatCompletion.choices[0].message.content);

Cache TTL Options

By default, cache checkpoints use the standard TTL. You can optionally specify a TTL by adding the ttl field to cache_control:
{
  "cache_control": { "type": "ephemeral", "ttl": "1h" }
}
Supported TTL values are "5m" (5 minutes) and "1h" (1 hour). The TTL is forwarded to Bedrock’s cachePoint configuration.

Supported Features and Limitations

Supported Features
  • Text prompts and images embedded within text prompts
  • Multiple cache checkpoints per request
  • Caching in system prompts, messages, and tools fields (model-dependent)
  • Configurable TTL (5m or 1h) per cache checkpoint

Supported Models and Limits

Below is a detailed table of supported models, their minimum token requirements, maximum cache checkpoints, and fields that support caching:
ModelModel IDMin tokensMax checkpointsSupported TTLCacheable fields
Claude Opus 4.6anthropic.claude-opus-4-6-v14,09645 minsystem, messages, tools
Claude Opus 4.5anthropic.claude-opus-4-5-20251101-v1:04,09645 min, 1 hoursystem, messages, tools
Claude Opus 4anthropic.claude-opus-4-20250514-v1:01,02445 minsystem, messages, tools
Claude Sonnet 4.6anthropic.claude-sonnet-4-61,02445 minsystem, messages, tools
Claude Sonnet 4.5anthropic.claude-sonnet-4-5-20250929-v1:04,09645 min, 1 hoursystem, messages, tools
Claude Haiku 4.5anthropic.claude-haiku-4-5-20251001-v1:04,09645 min, 1 hoursystem, messages, tools
Claude 3.5 Haikuanthropic.claude-3-5-haiku-20241022-v1:02,04845 minsystem, messages, tools
Claude 3.7 Sonnetanthropic.claude-3-7-sonnet-20250219-v1:01,02445 minsystem, messages, tools
Claude 3.5 Sonnet v2anthropic.claude-3-5-sonnet-20241022-v2:01,02445 minsystem, messages, tools
Amazon Nova Microamazon.nova-micro-v1:01,00045 minsystem, messages
Amazon Nova Liteamazon.nova-lite-v1:01,00045 minsystem, messages
Amazon Nova Proamazon.nova-pro-v1:01,00045 minsystem, messages
  • Extended TTL: Claude Opus 4.5, Claude Sonnet 4.5, and Claude Haiku 4.5 support both 5-minute and 1-hour TTL options.
  • The Amazon Nova models support a maximum of 20K tokens for prompt caching. Prompt caching is primarily for text prompts. They also support automatic prompt caching for all text prompts without explicit configuration.
  • For Claude models, tools caching is fully supported.
  • Tools caching is not supported for Amazon Nova models.
  • Claude 3.5 Sonnet v2 is in Preview status.

Understanding Token Counts and Pricing

Portkey automatically calculates correct pricing for prompt caching requests. In the logs, you’ll see cache-related token counts in the usage object:
  • cache_creation_input_tokens: Number of tokens written to the cache when creating a new entry.
  • cache_read_input_tokens: Number of tokens retrieved from the cache for this request.
Token Format NormalizationPortkey normalizes responses to the OpenAI format. In this format, prompt_tokens includes the cached tokens:
prompt_tokens = inputTokens + cache_read_input_tokens + cache_creation_input_tokens
This differs from native provider formats where input tokens may exclude cached tokens. Portkey’s pricing calculation accounts for this by:
  1. Subtracting cached tokens from prompt_tokens to get the base input token count
  2. Applying the standard input token rate to base tokens
  3. Applying the discounted cache read rate to cache_read_input_tokens
  4. Applying the cache write rate to cache_creation_input_tokens
This ensures accurate cost calculation even though the token format is normalized.

AWS Bedrock Prompt Caching Docs

For more detailed information on Bedrock prompt caching, refer to:
Last modified on May 27, 2026