Production Guides

Everything We Know About Claude Code Limits

Last updated - March 2026

TL;DR

For Claude.ai Users:
The core limit is a 5-hour rolling session that begins with your first prompt. Weekly quotas (introduced August 28, 2025) now apply to heavy users on Claude Pro and Claude Max plans. All Claude.ai plans share a common usage bucket across the Claude app and Claude Code; Max plans multiply the allowance accordingly. Max subscribers can also purchase additional usage at standard API rates once they hit limits.
For API Users:
API usage is separate and billed pay-as-you-go per token. The latest models, Opus 4.6 and Sonnet 4.6, are priced at $5/$25 and $3/$15 per million tokens respectively. You get the same rates whether you use Anthropic, AWS Bedrock, Google Vertex AI, or Microsoft Foundry.
You can choose whichever region, VPC, or IAM model best fits your stack.
For Portkey Users:
With Portkey, you can route Claude Code through any supported provider i.e., Anthropic, Bedrock, Vertex, Foundry, and others, from a single endpoint, pulling in maximum capacity from multiple sources simultaneously. Portkey also lets you centralize cost controls, add security guardrails, configure automatic failover, and get audit logs automatically.

0:00

/0:12

Understanding the 5‑Hour session model

The moment you run claude in your terminal, a 5‑hour rolling window begins. All messages and token spend during that period draw from your plan’s pool; the clock resets only when you send the next message after the 5 hours lapse. Pro users average 10‑40 prompts per window, while Max 20× users can push 200‑800 prompts depending on code size and model choice. (Source: Anthropic Help Center)

💡 Tips

Time your first prompt so the window spans your peak coding block; fire a dummy ping at 13:01 to start a fresh window for the afternoon.
Batch edits: one long diff request burns fewer tokens than several “please refine” follow‑ups.
Use /compact in Claude Code to manage context costs and /model to switch models based on task complexity.

Weekly Limits (Live Since August 28, 2025)

In addition to the 5-hour rolling windows, Anthropic now enforces weekly rate limits that reset every seven days. These include an overall usage cap plus a separate weekly cap for Opus models. Anthropic says the limits affect fewer than 5% of subscribers based on usage patterns.

Here's what to expect per week:

Plan	Weekly Sonnet hours	Weekly Opus hours
Pro ($20/mo)	40-80	--
Max 5x ($100/mo)	140-280	15-35
Max 20x ($200/mo)	240-480	24-40

Usage varies based on codebase size, model complexity, and session behavior. Max subscribers can purchase additional usage beyond the weekly limit at standard API rates.

Plan pricing & capacities

Plan	Monthly fee	Relative capacity	Typical Claude Code prompts / 5 h	Model access
Free	$0	~1/10 Pro	2-5 (Sonnet 4.6 only)	Sonnet 4.6
Pro	$20	baseline	10-40	Sonnet 4.6
Max 5x	$100	5x Pro	50-200	Sonnet 4.6 + Opus 4.6
Max 20x	$200	20x Pro	200-800	Sonnet 4.6 + Opus 4.6
Team (Standard)	$25/user	--	--	Sonnet 4.6
Team (Premium)	$150/user	--	Includes Claude Code	Sonnet 4.6 + Opus 4.6

Pro plans are also available at $17/month when billed annually. Team plans require a minimum of 5 users.

Token‑Based API Pricing (Console, Bedrock, Vertex)

Model	Anthropic API	AWS Bedrock	Google Vertex AI	Microsoft Foundry
Opus 4.6	$5 in / $25 out per M tokens	same list price	same list price	same list price
Sonnet 4.6	$3 in / $15 out	same	same	same

Previous generation (still available)

Model	Anthropic API	AWS Bedrock	Google Vertex AI	Microsoft Foundry
Opus 4.5	$5 in / $25 out	same	same	same
Sonnet 4.5	$3 in / $15 out	same	same	same
Haiku 4.5	$1 in / $5 out	same	same	same

Legacy models

Model	Pricing
Opus 4 / 4.1	$15 in / $75 out per M tokens
Sonnet 4 / 3.7	$3 in / $15 out

Key pricing notes:

Long context premium: When using the 1M token context window (beta) and exceeding 200K input tokens, rates increase to $10/$37.50 (Opus 4.6) and $6/$22.50 (Sonnet 4.6) per M tokens.
Fast mode (Opus 4.6): 6x standard rates for significantly faster output. Currently in research preview.
Prompt caching: Cache reads cost just 10% of base input price. Cache writes are 1.25x (5-min TTL) or 2x (1-hour TTL).
Batch API: 50% discount on all models for non-urgent workloads.
US-only inference: 1.1x multiplier on all token costs for Opus 4.6 and newer models.
Bedrock, Vertex, and Foundry bill in your account's currency and add standard cloud charges (data egress, networking, etc.). Anthropic confirms pricing parity across channels. Bedrock and Vertex offer global and regional endpoints, with regional endpoints including a 10% premium.

Which endpoint should you use?

Need	Best option	Why
Lowest latency to US-East	AWS Bedrock (us-east-1)	Dedicated Anthropic capacity inside AWS infra
Private networking / VPC-SC	Vertex AI Private Service Connect	Keeps traffic off the public internet
Azure-native billing and governance	Microsoft Foundry	MACC-eligible, Entra ID auth, Azure Monitor integration
Global multi-region failover	Portkey Gateway	Routes across Anthropic, Bedrock, Vertex, Foundry, and 250+ LLMs from a single endpoint
Advanced cost controls	Portkey budgets + prompt caching	Enforce spend limits, auto-downgrade models, per-team attribution

Portkey: One Switch for Anthropic, Bedrock, or Vertex

Direct Claude Code is great for solo hacking, but enterprises need:

Unified gateway routing – point Claude Code to Portkey once; switch between Anthropic, Bedrock, Vertex, Foundry, or any other provider by changing a config value.
Live spend & usage dashboards – 40‑plus metrics, cost attribution by team, and budget alerts.
RBAC + guardrails – per‑role rate limits, banned‑pattern filters, and partner guardrails from Palo Alto, Prompt Security, Pangea, etc.
Audit‑ready logs – every prompt, response, and error logged with metadata for compliance.

Bottom line: Portkey turns Claude Code into an enterprise‑grade service without forcing you to rebuild scripts or choose a single cloud. Drop your API key(s) into Portkey, set a config, and your devs keep typing claude …—now with observability, governance, and cloud‑agnostic resilience baked in.

If you're looking to implement Claude Code for your organisation with governance at scale, book a demo with us for a detailed walkthrough, or try it yourself!

💡

Keep Watching
Anthropic may refine quotas again once weekly limits ship, and Bedrock/Vertex regions keep expanding. Bookmark this post—we’ll update as soon as the numbers move.

Everything We Know About Claude Code Limits

TL;DR

Understanding the 5‑Hour session model

Weekly Limits (Live Since August 28, 2025)

Plan pricing & capacities

Token‑Based API Pricing (Console, Bedrock, Vertex)

Previous generation (still available)

Legacy models

Which endpoint should you use?

Portkey: One Switch for Anthropic, Bedrock, or Vertex

Read next

Failover routing strategies for LLMs in production

End-to-End Debugging: Tracing Failures from the LLM Call to the User Experience

Why reliability in AI applications is now a competitive differentiator