Portkey makes Anthropic’s prompt caching work on our OpenAI-compliant universal API.

Just pass Anthropic’s anthropic-beta header in your request, and set the cache_control param in your respective message body:

Anthropic currently has certain restrictions on prompt caching, like:

  • Cache TTL is set at 5 minutes and can not be changed
  • The message you are caching needs to cross minimum length to enable this feature
    • 1024 tokens for Claude 3.5 Sonnet and Claude 3 Opus
    • 2048 tokens for Claude 3 Haiku

For more, refer to Anthropic’s prompt caching documentation here.

Seeing Cache Results in Portkey

Portkey automatically calculate the correct pricing for your prompt caching requests & responses based on Anthropic’s calculations here:

In the individual log for any request, you can also see the exact status of your request and verify if it was cached, or delivered from cache with two usage parameters:

  • cache_creation_input_tokens: Number of tokens written to the cache when creating a new entry.
  • cache_read_input_tokens: Number of tokens retrieved from the cache for this request.