Claude / Operating Manual · Topics
Models & Cost
Model choice is the biggest cost lever you have, roughly a 30x spread for the same work. Default to the cheapest model that clears the task, override per project, and reach up only for the hard parts.
GA · updated 2026-06-14
The model you run is the single biggest cost lever in Claude Code. For the same piece of work the tiers can differ by around 30x, so the default posture is simple: pick the cheapest model that clears the task, override it per project, and reach up to the top tier only for the parts that actually need it.
The lineup
| Model | Tier | Reach for |
|---|---|---|
Haiku 4.5 (claude-haiku-4-5) | fast, cheap | routine filtering, docs, reading transcripts |
Sonnet 4.6 (claude-sonnet-4-6) | mid | most coding and infra work |
Opus 4.8 (claude-opus-4-8) | high | hard reasoning, nasty refactors |
Fable 5 (claude-fable-5) | frontier, above Opus | the hardest work; strong UI generation |
| Fast mode | Opus, faster output | Opus quality when you want speed (costs more per token) |
Fast mode is not a smaller model. It is Opus with faster output, toggled with /fast, and
it carries a higher per-token price for that speed.
Pricing
Per million tokens, from the model catalog (confirmed 2026-06):
| Model | Input ($/1M) | Output ($/1M) |
|---|---|---|
| Haiku 4.5 | 1 | 5 |
| Sonnet 4.6 | 3 | 15 |
| Opus 4.8 | 5 | 25 |
| Fable 5 | 10 | 50 |
Opus 4.8 is $5 in and $25 out, unchanged from 4.7. Fast mode is the same Opus model with faster output at a higher rate (reported around $10 / $50); it is a Claude Code product setting, so confirm the current Fast-mode rate in the Claude Code docs rather than the API price list. The "$15 / $75" and "$0.25 / $1.25" figures that circulate are stale (older Opus and Haiku 3.5), not current tier prices. Standing rule still holds: re-check the official pricing page before you size a real budget, since prices move.
The levers
Per-project model override. The highest-value setting: run Haiku in a docs repo, Sonnet in infra, Opus only where it matters. Set it once per project and stop overpaying by default.
/effort and ultracode. Dial reasoning effort up for hard problems; ultracode turns on
heavier multi-agent work on demand. Both cost more, so use them deliberately rather than as
a default.
Extended thinking levels. A lighter thinking default trims Opus token usage noticeably (reported in the 18 to 25% range) without you changing models.
Multi-model within a session. You can mix tiers inside one job: a cheap model handles routine sub-steps (reading a long transcript, filtering) while Opus or Sonnet does the actual work. This keeps the expensive model off the boring tokens.
The cost reality
One user ran the same 2.3-hour session (372 turns, 57.9M tokens) on different models: roughly $8.57 on Haiku, $25.71 on Sonnet, and $290.41 on Opus. Same work, a 30x spread. That gap is the entire argument for defaulting down and reaching up only on purpose.
Decide: which model for which work?
| The work | Model |
|---|---|
| Routine filtering, doc edits, reading long inputs | Haiku |
| Everyday coding, infra, refactors that are not subtle | Sonnet |
| Hard reasoning, subtle or large refactors, tricky bugs | Opus |
| The hardest problems, top-tier UI generation | Fable 5 |
| Opus quality but you need it fast | Opus + fast mode |
Recipes
1. Set the per-project default. In a low-stakes repo (docs, content), default to Haiku or Sonnet. Reserve Opus for the repos where reasoning quality pays for itself.
2. Default down, reach up. Start a task on Sonnet. When you hit the genuinely hard part (a nasty refactor, a subtle bug), switch up to Opus or Fable for that stretch, then drop back.
3. Multi-model session. Let a cheap model read the transcript or filter candidates each turn, and keep the expensive model for the work that needs it.
4. ultracode for a real wall. When a problem actually warrants exhaustive multi-agent effort, turn it on, get the answer, turn it off. It is a tool, not a setting to leave on.
Failure modes
- Opus everywhere. Running the top tier for routine work pays up to 30x for nothing. Make the cheaper model the default and escalate by exception.
- Reseller proxy "discounts." Posts promising "save 92% on Opus" are third-party proxies routing your traffic, not real pricing. Avoid them; you give up trust and control for a number that is not what it claims.
- Unbounded autonomous Opus loops. A detached loop on Opus can spend serious money while you sleep. Put a spending cap in the job itself. See Loops & Autonomy.
- Planning from secondhand prices. Model and token prices move. Size budgets from the official pricing page, not from a post or from memory.
Quick reference
| You want | Reach for |
|---|---|
| Cheapest viable model | Haiku, by default in low-stakes repos |
| The everyday workhorse | Sonnet |
| Hard reasoning / subtle refactor | Opus |
| The frontier tier | Fable 5 |
| Opus quality, faster | fast mode (/fast) |
| Stop overpaying by default | per-project model override |
Cost cheat: default down, reach up on purpose, cap any autonomous run, and verify every price against the official page.
Reference pages already exist for model selection and permissions & settings; this guide is the opinionated operating layer over them.
Related reference