Prompt Caching on Bedrock

On this page

Model Support
How Bedrock Prompt Caching Works
Implementation Examples
Supported Features and Limitations
Supported Models and Limits
Related Resources

Prompt caching on Amazon Bedrock lets you cache specific portions of your requests for repeated use. This feature significantly reduces inference response latency and input token costs by allowing the model to skip recomputation of previously processed content. With Portkey, you can easily implement Amazon Bedrock’s prompt caching through our OpenAI-compliant unified API and prompt templates.

Model Support

Amazon Bedrock prompt caching is generally available with the following models:

Currently Supported Models:

Claude 3.7 Sonnet
Claude 3.5 Haiku
Amazon Nova Micro
Amazon Nova Lite
Amazon Nova Pro

Customers who were given access to Claude 3.5 Sonnet v2 during the prompt caching preview will retain their access, but no additional customers will be granted access to prompt caching on the Claude 3.5 Sonnet v2 model.

How Bedrock Prompt Caching Works

When using prompt caching, you define cache checkpoints - markers that indicate parts of your prompt to cache. These cached sections must be static between requests; any alterations will result in a cache miss.

You can also use Bedrock Prompt Caching Feature with Portkey’s Prompt Templates.

Implementation Examples

Here’s how to implement prompt caching with Portkey:

import Portkey from 'portkey-ai'

const portkey = new Portkey({
    apiKey: "PORTKEY_API_KEY", // defaults to process.env["PORTKEY_API_KEY"]
    provider:"@PROVIDER" // Your Bedrock Virtual Key
})

const chatCompletion = await portkey.chat.completions.create({
    messages: [
        { "role": 'system', "content": [
            {
                "type":"text","text":"You are a helpful assistant"
            },
            {
                "type":"text","text":"This is a large document I want to cache...",
                "cache_control": {"type": "ephemeral"}
            }
        ]},
        { "role": 'user', "content": 'Summarize the above document for me in 20 words' }
    ],
    model: 'anthropic.claude-3-7-sonnet-20250219-v1:0'
});

console.log(chatCompletion.choices[0].message.content);

Supported Features and Limitations

Supported Features

Text prompts and images embedded within text prompts
Multiple cache checkpoints per request
Caching in system prompts, messages, and tools fields (model-dependent)

Supported Models and Limits

Below is a detailed table of supported models, their minimum token requirements, maximum cache checkpoints, and fields that support caching:

Model	Model ID	Min tokens per checkpoint	Max checkpoints per request	Cacheable fields
Claude 3.7 Sonnet	anthropic.claude-3-7-sonnet-20250219-v1:0	1,024	4	system, messages, tools
Claude 3.5 Haiku	anthropic.claude-3-5-haiku-20241022-v1:0	2,048	4	system, messages, tools
Amazon Nova Micro	amazon.nova-micro-v1:0	1,000	4	system, messages
Amazon Nova Lite	amazon.nova-lite-v1:0	1,000	4	system, messages
Amazon Nova Pro	amazon.nova-pro-v1:0	1,000	4	system, messages

The Amazon Nova models support a maximum of 32k tokens for prompt caching.
For Claude models, tools caching is fully supported.
Tools caching is not supported for Amazon Nova models.

AWS Bedrock Prompt Caching Docs

For more detailed information on Bedrock prompt caching, refer to:

Fine-tune Embeddings

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

Prompt Caching on Bedrock

Model Support

How Bedrock Prompt Caching Works

Implementation Examples

Supported Features and Limitations

Supported Models and Limits

AWS Bedrock Prompt Caching Docs

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

​Model Support

​How Bedrock Prompt Caching Works

​Implementation Examples

​Supported Features and Limitations

​Supported Models and Limits

​Related Resources

AWS Bedrock Prompt Caching Docs

Model Support

How Bedrock Prompt Caching Works

Implementation Examples

Supported Features and Limitations

Supported Models and Limits

Related Resources