Cost & Performance

Claude Code Cost Optimization: Pricing, Caching, and Token Management

Charles Krzentowski8 min read

Let's talk money. Claude Code can get expensive fast — or it can cost less than your daily coffee. The difference comes down to a handful of decisions you make once and a few habits you build.

The top 10% of setups in our analysis (scoring 8+/10) spend about $8 per day on Claude Code. Their productivity gains far outweigh the cost, but they get there by understanding where tokens go and how to waste fewer of them.

Here's what they know that you probably don't.

The two pricing options (and how to pick)

Claude Code offers two billing models. Choosing the wrong one is the most common cost mistake we see.

Max plan: flat monthly rate

  • Max 5x ($100/month) — roughly 2-3 hours of active sessions per day
  • Max 20x ($200/month) — designed for people who code with Claude most of the day

If you use Claude Code every workday for 2+ hours, the Max plan saves you money versus pay-per-token pricing. You also get predictable bills — no "what happened last month?" moments.

API pricing: pay for what you use

Model Input tokens Output tokens
Sonnet 4 $3 / 1M tokens $15 / 1M tokens
Opus 4 $15 / 1M tokens $75 / 1M tokens
Cached input (cache hit) 90% discount Same output price

A typical session generates 50,000 to 200,000 tokens per hour, depending on how many files Claude reads and how much code it writes. At Sonnet rates, that's roughly:

  • Light session (quick questions, small edits): ~$0.30/hour
  • Medium session (feature work, multi-file changes): ~$0.90/hour
  • Heavy session (large refactors, lots of file reads): ~$2.25/hour

With Opus, multiply by 5. A heavy Opus session runs about $11.25/hour. That's why model choice matters so much (more on that below).

The break-even math

At medium Sonnet intensity (~$0.90/hour):

  • Max 5x ($100/month) breaks even at roughly 111 hours/month, which is about 5.5 hours per workday
  • Max 20x ($200/month) breaks even at roughly 222 hours/month

For most daily users, Max 5x is the sweet spot. If you use Claude Code a few times a week rather than every day, API pricing is cheaper.

Where your tokens actually go

You can't optimize what you don't understand. Here's the breakdown of a typical session:

What Share of tokens What it includes
System prompt + CLAUDE.md 5-15% Loaded once, cached after first turn
File reads 30-50% Every file Claude reads via Grep, Read, Glob
Conversation history 15-25% All previous messages in the session
Tool calls (inputs + outputs) 10-20% Bash commands, edits, MCP calls
Claude's responses 10-15% The actual text and code Claude generates

Two things jump out: file reads and conversation history dominate. Those are the two places where optimization has the biggest impact.

The /compact trick (cut your token usage dramatically)

Here's a command most people don't know about: /compact.

When your session gets long — 15+ turns, lots of back-and-forth — the conversation history balloons. Every new message includes the entire history that came before it. Your tokens compound.

/compact summarizes the conversation into a shorter representation. After compacting:

  • Subsequent turns cost less (less history to send)
  • Responses come faster (less for Claude to process)
  • You avoid hitting the context window limit

When to compact

  • Your session has been going for 15+ turns
  • Claude starts repeating things it already told you
  • Responses get noticeably slower
  • You're shifting to a different task within the same session

When NOT to compact

  • You're in the middle of a multi-step operation that requires precise memory
  • Claude needs to reference specific code from 2-3 turns ago
  • You're about to commit — compact AFTER the commit, not before

Compact vs. new session

Sometimes starting fresh is better than compacting:

Compact New session
Keeps Summary of conversation Nothing
Loses Details, nuance Everything
Best for Continuing same task Switching tasks
Token cost Reduced by 40-60% Reset to baseline

My rule: Switching tasks? New session. Same task but it's been a while? Compact. And always commit before either one — git preserves detail that compaction loses.

Opus vs Sonnet: the 5x question

Opus costs 5x more than Sonnet per token. Is it worth it? Sometimes. Here's how to decide.

Use Sonnet (the default) for most work

Sonnet handles the vast majority of coding tasks well:

  • Writing functions and components
  • Fixing straightforward bugs
  • Running tests and interpreting results
  • File exploration and search
  • Refactoring with clear instructions
  • Code review

That covers probably 90% of what you do in a day.

Switch to Opus for the hard stuff

Opus earns its premium in specific situations:

  • Architectural decisions — designing systems with multiple interacting components
  • Subtle bug diagnosis — bugs that span multiple code paths and abstraction layers
  • Large refactors — changes across many files that need consistency
  • Novel problem solving — tasks where the answer isn't a standard pattern

The hybrid workflow

The approach I've seen work best: Sonnet by default, Opus when you need the extra horsepower.

# Daily work (Sonnet)
claude

# Hard problem (Opus)
claude --model opus

# Or switch mid-session
> /model opus

Some developers take this further — Opus for planning, Sonnet for execution:

1. Start with Opus: "Plan the architecture for the notification system"
2. Review the plan, adjust it
3. Switch to Sonnet: "Implement the plan we just discussed"

Opus-quality thinking on the hard decisions, Sonnet-speed execution on the implementation. Best of both worlds.

Four habits that keep costs down

1. Focused sessions (the biggest single savings)

Instead of one marathon session that accumulates context for hours:

Session 1: "Add the database migration for notifications"
  → Complete, commit, close

Session 2: "Implement the API endpoints"
  → Complete, commit, close

Session 3: "Build the notification UI"
  → Complete, commit, close

Each session starts clean with full cache efficiency. No irrelevant context from previous tasks weighing down every turn.

2. Point Claude at specific files

Every file Claude reads costs tokens. A 500-line file is roughly 500 tokens of input. Reading 50 files in an exploration session adds 25,000 tokens — about $0.08 on Sonnet but $0.38 on Opus.

Help Claude read less:

  • "Look at lines 45-80 of src/api/route.ts" instead of "read the route file"
  • Point to specific files instead of letting Claude search broadly
  • Keep your architecture documentation accurate so Claude doesn't need to explore to find things

3. Move rules out of CLAUDE.md

This one is subtle but adds up. Every line in CLAUDE.md is loaded on every turn. Rules in .claude/rules/ only load when their glob patterns match.

If you have 200 lines of frontend conventions, 150 lines of backend rules, and 100 lines of database standards all in CLAUDE.md, Claude loads all 450 lines every single turn — even when you're editing a CSS file.

Move file-specific instructions to rules files. Your CLAUDE.md stays lean (20-30 lines of project-wide essentials), and you save ~100 lines of tokens on most turns.

4. Commit before compacting

This is a pattern from our highest-scoring setups. Always commit before running /compact or closing a session.

1. Complete the current unit of work
2. git add + git commit
3. /compact (or start new session)
4. Continue with the next unit

Compaction loses detail. Git preserves it. If you need to pick up a task later, the commit message and diff are far more reliable than a compacted summary.

Real cost numbers

Here's what different usage patterns actually cost, based on data from setups we've analyzed:

Profile Model Hours/day Monthly cost How
Light user Sonnet 1-2 $20-40 (API) Pay per token, focused sessions
Daily developer Sonnet 3-5 $100 (Max 5x) Max plan, compact regularly
Power user Sonnet + Opus 4-6 $200 (Max 20x) Max plan, Opus for architecture only
Team (5 devs) Sonnet 2-4 each $500 (5x Max 5x) Individual Max plans, shared CLAUDE.md
CI/CD automation Sonnet N/A $50-150 (API) API pricing, headless mode

The $8/day average for top setups works out to $160-180/month — close to the Max 20x plan. These are heavy users who work with Claude Code as their primary tool.

Monitoring your spending

On the Max plan

Anthropic provides usage dashboards showing consumption relative to plan limits. Check weekly. If you're consistently hitting the ceiling, you need a higher tier. If you're barely using half, you could drop down or switch to API pricing.

On API pricing

Set up alerts:

  1. Go to Settings > Billing > Alerts in the Anthropic console
  2. Set a daily alert (e.g., $15/day)
  3. Set a monthly budget cap

For CI/CD automation, add turn limits to prevent runaway costs:

claude -p "Review this PR" --max-turns 10 --output-format json

What's next

Cost optimization isn't a one-time thing. The strategies here — model selection, focused sessions, /compact, rules over CLAUDE.md bloat — compound over time. A developer who applies all of them spends 40-60% less than someone using Claude Code at defaults.

For the foundation these strategies build on:

Frequently Asked Questions

Is the Max plan worth it if I only use Claude Code 3-4 days a week?

It depends on intensity. If those 3-4 days involve heavy usage (4+ hours each), Max 5x at $100/month is probably cheaper than API. If you're doing 1-2 hours on those days, API pricing wins. Track your usage for a month on API, multiply by the rates, and compare.

Does prompt caching work automatically?

Yes. You don't need to configure anything. The API recognizes when consecutive requests share the same prefix (your system prompt, CLAUDE.md, loaded rules) and charges 90% less for the cached portion. You can help caching work better by keeping CLAUDE.md stable during a session — every edit invalidates the cache and forces a full-price re-read.

When should I use /compact vs starting a new session?

Under 10 turns and same task: keep going. 15-20 turns and same task: compact. Switching to a different task: new session. If Claude starts "forgetting" things from earlier in the conversation, that's a strong signal to either compact or restart.

Can I skip Opus entirely and just use Sonnet?

Many developers do. Sonnet handles 90%+ of coding tasks well. Opus genuinely outperforms only in narrow cases: deep architectural reasoning, multi-file refactors with subtle consistency needs, and complex debugging across multiple abstraction layers. If your work is primarily feature development, bug fixes, and reviews, Sonnet alone is 5x cheaper and usually sufficient.

How do teams manage Claude Code costs?

Most teams use individual Max plans — one per developer. For shared costs (CI/CD reviews, automated triage), they use a single API key with spend alerts. The team lead monitors monthly spend and adjusts automation frequency if costs grow. A well-optimized shared CLAUDE.md also helps — it reduces per-developer token waste from project exploration.

FAQ

Is the Max plan worth it if I only use Claude Code 3-4 days a week?
It depends on intensity. If those 3-4 days involve heavy usage (4+ hours each), Max 5x at $100/month is probably cheaper than API. If you're doing 1-2 hours on those days, API pricing wins. Track your usage for a month on API, multiply by the rates, and compare.
Does prompt caching work automatically?
Yes. You don't need to configure anything. The API recognizes when consecutive requests share the same prefix (your system prompt, CLAUDE.md, loaded rules) and charges 90% less for the cached portion. You can help caching work better by keeping CLAUDE.md stable during a session — every edit invalidates the cache and forces a full-price re-read.
When should I use /compact vs starting a new session?
Under 10 turns and same task: keep going. 15-20 turns and same task: compact. Switching to a different task: new session. If Claude starts "forgetting" things from earlier in the conversation, that's a strong signal to either compact or restart.
Can I skip Opus entirely and just use Sonnet?
Many developers do. Sonnet handles 90%+ of coding tasks well. Opus genuinely outperforms only in narrow cases: deep architectural reasoning, multi-file refactors with subtle consistency needs, and complex debugging across multiple abstraction layers. If your work is primarily feature development, bug fixes, and reviews, Sonnet alone is 5x cheaper and usually sufficient.
How do teams manage Claude Code costs?
Most teams use individual Max plans — one per developer. For shared costs (CI/CD reviews, automated triage), they use a single API key with spend alerts. The team lead monitors monthly spend and adjusts automation frequency if costs grow. A well-optimized shared CLAUDE.md also helps — it reduces per-developer token waste from project exploration.