Models & Cost | Operating Manual

The model you run is the single biggest cost lever in Claude Code. For the same piece of work the tiers can differ by around 30x, so the default posture is simple: pick the cheapest model that clears the task, override it per project, and reach up to the top tier only for the parts that actually need it.

The lineup

Model	Tier	Reach for
Haiku 4.5 (`claude-haiku-4-5`)	fast, cheap	routine filtering, docs, reading transcripts
Sonnet 4.6 (`claude-sonnet-4-6`)	mid	most coding and infra work
Opus 4.8 (`claude-opus-4-8`)	high	hard reasoning, nasty refactors
Fable 5 (`claude-fable-5`)	frontier, above Opus	the hardest work; strong UI generation
Fast mode	Opus, faster output	Opus quality when you want speed (costs more per token)

Fast mode is not a smaller model. It is Opus with faster output, toggled with /fast, and it carries a higher per-token price for that speed.

Pricing

Per million tokens, from the model catalog (confirmed 2026-06):

Model	Input ($/1M)	Output ($/1M)
Haiku 4.5	1	5
Sonnet 4.6	3	15
Opus 4.8	5	25
Fable 5	10	50

Opus 4.8 is $5 in and $25 out, unchanged from 4.7. Fast mode is the same Opus model with faster output at a higher rate (reported around $10 / $50); it is a Claude Code product setting, so confirm the current Fast-mode rate in the Claude Code docs rather than the API price list. The "$15 / $75" and "$0.25 / $1.25" figures that circulate are stale (older Opus and Haiku 3.5), not current tier prices. Standing rule still holds: re-check the official pricing page before you size a real budget, since prices move.

The levers

Per-project model override. The highest-value setting: run Haiku in a docs repo, Sonnet in infra, Opus only where it matters. Set it once per project and stop overpaying by default.

/effort and ultracode. Dial reasoning effort up for hard problems; ultracode turns on heavier multi-agent work on demand. Both cost more, so use them deliberately rather than as a default.

Extended thinking levels. A lighter thinking default trims Opus token usage noticeably (reported in the 18 to 25% range) without you changing models.

Multi-model within a session. You can mix tiers inside one job: a cheap model handles routine sub-steps (reading a long transcript, filtering) while Opus or Sonnet does the actual work. This keeps the expensive model off the boring tokens.

The cost reality

One user ran the same 2.3-hour session (372 turns, 57.9M tokens) on different models: roughly $8.57 on Haiku, $25.71 on Sonnet, and $290.41 on Opus. Same work, a 30x spread. That gap is the entire argument for defaulting down and reaching up only on purpose.

Decide: which model for which work?

The work	Model
Routine filtering, doc edits, reading long inputs	Haiku
Everyday coding, infra, refactors that are not subtle	Sonnet
Hard reasoning, subtle or large refactors, tricky bugs	Opus
The hardest problems, top-tier UI generation	Fable 5
Opus quality but you need it fast	Opus + fast mode

Recipes

1. Set the per-project default. In a low-stakes repo (docs, content), default to Haiku or Sonnet. Reserve Opus for the repos where reasoning quality pays for itself.

2. Default down, reach up. Start a task on Sonnet. When you hit the genuinely hard part (a nasty refactor, a subtle bug), switch up to Opus or Fable for that stretch, then drop back.

3. Multi-model session. Let a cheap model read the transcript or filter candidates each turn, and keep the expensive model for the work that needs it.

4. ultracode for a real wall. When a problem actually warrants exhaustive multi-agent effort, turn it on, get the answer, turn it off. It is a tool, not a setting to leave on.

Failure modes

Opus everywhere. Running the top tier for routine work pays up to 30x for nothing. Make the cheaper model the default and escalate by exception.
Reseller proxy "discounts." Posts promising "save 92% on Opus" are third-party proxies routing your traffic, not real pricing. Avoid them; you give up trust and control for a number that is not what it claims.
Unbounded autonomous Opus loops. A detached loop on Opus can spend serious money while you sleep. Put a spending cap in the job itself. See Loops & Autonomy.
Planning from secondhand prices. Model and token prices move. Size budgets from the official pricing page, not from a post or from memory.

Quick reference

You want	Reach for
Cheapest viable model	Haiku, by default in low-stakes repos
The everyday workhorse	Sonnet
Hard reasoning / subtle refactor	Opus
The frontier tier	Fable 5
Opus quality, faster	fast mode (`/fast`)
Stop overpaying by default	per-project model override

Cost cheat: default down, reach up on purpose, cap any autonomous run, and verify every price against the official page.

Reference pages already exist for model selection and permissions & settings; this guide is the opinionated operating layer over them.