Claude / Operating Manual · Topics

← All Operating Manual pages

Models & Cost

Model choice is the biggest cost lever you have, roughly a 30x spread for the same work. Default to the cheapest model that clears the task, override per project, and reach up only for the hard parts.

GA · updated 2026-06-14

The model you run is the single biggest cost lever in Claude Code. For the same piece of work the tiers can differ by around 30x, so the default posture is simple: pick the cheapest model that clears the task, override it per project, and reach up to the top tier only for the parts that actually need it.

The lineup

ModelTierReach for
Haiku 4.5 (claude-haiku-4-5)fast, cheaproutine filtering, docs, reading transcripts
Sonnet 4.6 (claude-sonnet-4-6)midmost coding and infra work
Opus 4.8 (claude-opus-4-8)highhard reasoning, nasty refactors
Fable 5 (claude-fable-5)frontier, above Opusthe hardest work; strong UI generation
Fast modeOpus, faster outputOpus quality when you want speed (costs more per token)

Fast mode is not a smaller model. It is Opus with faster output, toggled with /fast, and it carries a higher per-token price for that speed.

Pricing

Per million tokens, from the model catalog (confirmed 2026-06):

ModelInput ($/1M)Output ($/1M)
Haiku 4.515
Sonnet 4.6315
Opus 4.8525
Fable 51050

Opus 4.8 is $5 in and $25 out, unchanged from 4.7. Fast mode is the same Opus model with faster output at a higher rate (reported around $10 / $50); it is a Claude Code product setting, so confirm the current Fast-mode rate in the Claude Code docs rather than the API price list. The "$15 / $75" and "$0.25 / $1.25" figures that circulate are stale (older Opus and Haiku 3.5), not current tier prices. Standing rule still holds: re-check the official pricing page before you size a real budget, since prices move.

The levers

Per-project model override. The highest-value setting: run Haiku in a docs repo, Sonnet in infra, Opus only where it matters. Set it once per project and stop overpaying by default.

/effort and ultracode. Dial reasoning effort up for hard problems; ultracode turns on heavier multi-agent work on demand. Both cost more, so use them deliberately rather than as a default.

Extended thinking levels. A lighter thinking default trims Opus token usage noticeably (reported in the 18 to 25% range) without you changing models.

Multi-model within a session. You can mix tiers inside one job: a cheap model handles routine sub-steps (reading a long transcript, filtering) while Opus or Sonnet does the actual work. This keeps the expensive model off the boring tokens.

The cost reality

One user ran the same 2.3-hour session (372 turns, 57.9M tokens) on different models: roughly $8.57 on Haiku, $25.71 on Sonnet, and $290.41 on Opus. Same work, a 30x spread. That gap is the entire argument for defaulting down and reaching up only on purpose.

Decide: which model for which work?

The workModel
Routine filtering, doc edits, reading long inputsHaiku
Everyday coding, infra, refactors that are not subtleSonnet
Hard reasoning, subtle or large refactors, tricky bugsOpus
The hardest problems, top-tier UI generationFable 5
Opus quality but you need it fastOpus + fast mode

Recipes

1. Set the per-project default. In a low-stakes repo (docs, content), default to Haiku or Sonnet. Reserve Opus for the repos where reasoning quality pays for itself.

2. Default down, reach up. Start a task on Sonnet. When you hit the genuinely hard part (a nasty refactor, a subtle bug), switch up to Opus or Fable for that stretch, then drop back.

3. Multi-model session. Let a cheap model read the transcript or filter candidates each turn, and keep the expensive model for the work that needs it.

4. ultracode for a real wall. When a problem actually warrants exhaustive multi-agent effort, turn it on, get the answer, turn it off. It is a tool, not a setting to leave on.

Failure modes

  • Opus everywhere. Running the top tier for routine work pays up to 30x for nothing. Make the cheaper model the default and escalate by exception.
  • Reseller proxy "discounts." Posts promising "save 92% on Opus" are third-party proxies routing your traffic, not real pricing. Avoid them; you give up trust and control for a number that is not what it claims.
  • Unbounded autonomous Opus loops. A detached loop on Opus can spend serious money while you sleep. Put a spending cap in the job itself. See Loops & Autonomy.
  • Planning from secondhand prices. Model and token prices move. Size budgets from the official pricing page, not from a post or from memory.

Quick reference

You wantReach for
Cheapest viable modelHaiku, by default in low-stakes repos
The everyday workhorseSonnet
Hard reasoning / subtle refactorOpus
The frontier tierFable 5
Opus quality, fasterfast mode (/fast)
Stop overpaying by defaultper-project model override

Cost cheat: default down, reach up on purpose, cap any autonomous run, and verify every price against the official page.

Reference pages already exist for model selection and permissions & settings; this guide is the opinionated operating layer over them.

Related reference

  1. Model SelectionWhich Claude model to run, when to flip to fast mode, and how fallbacks work.
  2. Permissions & SettingsConfigure what Claude can run without asking via settings.json allowlists at project or user scope.