Thinking Mode
Thinking/Reasoning models represent a new generation of LLMs specifically trained to expose their internal reasoning process. Unlike traditional LLMs that only show final outputs, thinking models like Claude 3.7 Sonnet, OpenAI o1/o3, and Deepseek R1 are designed to “think out loud” - producing a detailed chain of thought before delivering their final response.
These reasoning-optimized models are built to excel in tasks requiring complex analysis, multi-step problem solving, and structured logical thinking. Portkey makes these advanced models accessible through a unified API specification that works consistently across providers.
Supported Thinking Models
Portkey currently supports the following thinking-enabled models:
- Anthropic:
claude-3-7-sonnet-latest
- Google Vertex AI:
anthropic.claude-3-7-sonnet@20250219
- Amazon Bedrock:
claude-3-7-sonnet
More thinking models will be supported as they become available.
Using Thinking Mode
- You must set
strict_open_ai_compliance=False
in your headers or client configuration - The thinking response is returned in a different format than standard completions
- For streaming responses, the thinking content is in
response_chunk.choices[0].delta.content_blocks
Extended thinking API through Portkey is currently in beta.
Basic Example
Multi-Turn Conversations
For multi-turn conversations, include the previous thinking content in the conversation history:
Understanding Response Format
When using thinking-enabled models, be aware of the special response format:
The assistant’s thinking response is returned in the response_chunk.choices[0].delta.content_blocks
array, not the response.choices[0].message.content
string.
This is especially important for streaming responses, where you’ll need to specifically parse and extract the thinking content from the content blocks.
When to Use Thinking Models
Thinking models are particularly valuable in specific use cases:
FAQs
Was this page helpful?