What is the difference between Claude Opus and Sonnet, and which should I use?

Sonnet fits well-specified execution: coding to a clear spec, writing tests, repetitive work. Opus fits heavy thinking: system design, cross-file debugging, reviewing. From real fleet use, the clearer your brief, the narrower the gap between them. Pay for the expensive model only where real reasoning happens.

What is Claude Fable 5 and how does it differ from Opus?

Fable 5 is the first model in the Claude 5 family, in a new Mythos-class tier positioned above Opus. The advantage is a matter of degree in ambiguous work, situational judgment, and trade-off weighing, not a new kind of capability. Ordinary execution is cheaper and just as good on Opus or Sonnet.

How do I use a cheaper model without quality dropping?

Reduce the ambiguity of the task first: slice work small, attach a runnable acceptance check to each slice, and spend the price gap on iterations, letting the cheaper model draft, critique its own work, and revise. The clearer the work, the narrower the gap between model tiers.

Should I use one model for everything?

No. From running a real agent fleet, splitting work by tier (Haiku for search, Sonnet for execution, Opus for hard thinking) is both cheaper and more accurate than one expensive model doing everything. An expensive model on routine work is paying for reasoning that the work never needed.

Claude Opus vs Sonnet vs Fable 5: Which Model for Which Work

Early July, we let the most expensive model in the house hand-write about two thousand lines of HTML and CSS. It did the job well, every line careful. Then, just before the work was done, its credit ran out mid-air, and a cheaper model had to take over with a long handoff note.

The model was not at fault. It faithfully did what it was told. The fault was in the staffing: we had put the chief architect on bricklaying duty and paid architect rates for a wall any builder could have laid.

That night we rewrote the model-picking rules for our whole agent fleet. This post is that rulebook: all four Claude tiers compared from real use, which work goes to which tier, and how to squeeze above-tier results out of a mid-tier model when the top one is not on the payroll.

Part 1Meet the four tiers

The Claude family now has four tiers we use in real work. Three are familiar names. The fourth is new: Fable 5, the first model of the Claude 5 family, in a new Mythos-class tier that Anthropic positions above Opus (official announcement).

Tier	Character	Role on our team
Haiku	Fastest and cheapest, answers straight	Scout unit: file search, pattern sweeps, fan out ten at a time without guilt
Sonnet	Price-to-skill balance, follows instructions precisely	Main hands: code to spec, tests, pattern-shaped work
Opus	Thinks in layers, sees cross-file connections	The thinker: system design, hard debugging, reviewing others' work
Fable 5	Reads situations, weighs trade-offs, handles ambiguity	The lead: decomposes work, synthesizes across sources, guards decision points

One thing worth saying plainly before moving on: the difference between the top tier and the one below is a matter of degree, not kind. Fable does nothing Opus categorically cannot. It just misses less often on work that is hard to interpret. Which means the model question is really a question about the nature of your task.

Part 2The one criterion that matters: ambiguity

At first we thought what everyone thinks: the criterion is difficulty. Hard work gets the expensive model, easy work gets the cheap one. After months of running a real fleet, the criterion that actually works turned out to be different: ambiguity, not difficulty.

Work that is hard but clear, say writing a parser against a fully specified grammar with tests to run, Sonnet handles so well you can barely tell it from the expensive tier. The judge of success lives outside the model: tests pass or they do not.

But work that looks easy yet is ambiguous, like "check whether this handoff note can be trusted," is where model tier shows up immediately. You have to read whether the writer knew or guessed, and spot which sentences are claims nobody verified. There is no test suite for that.

The gap between model tiers narrows as the work gets clearer. The tighter the spec, the more runnable the acceptance check, the smaller the slices, the closer the cheap model gets to the expensive one.

We pair it with one secondary criterion: is the action reversible? Work that can be undone can go to whatever tier its ambiguity suggests. But irreversible points, deploys, deletions, anything leaving the building, always get a human or the top tier standing at the gate, whatever tier does the surrounding work.

Part 3The verdict table: which work goes where

This is the actual rulebook our fleet runs on. Where a row cost us something to learn, the lesson is included.

Work	Use	Why
File search, pattern hunting, broad information gathering	Haiku, several in parallel	Read-heavy, think-light. Cheap enough per unit to fan out freely
Code to a clear spec, tests, boilerplate, pattern-shaped edits	Sonnet	The acceptance judge lives outside the model; paying for extra thinking buys nothing
System design, cross-file debugging, reviewing others' code	Opus	Real case from our fleet: Opus caught a transitive dependency bug that Sonnet read straight past. The higher rate pays for itself exactly here
Decomposing big work, synthesizing across sources, deciding under ambiguity	Fable 5	Work where one mistake is expensive downstream: plan wrong at the top and everyone executing the plan is wrong with you
High-volume routine work accidentally handed to the expensive tier	Don't	That opening story. The architect laid bricks until the credit ran out

Notice the middle column never says "best," only "best fit." Never pay reasoning rates for work that doesn't reason is written, in those words, in our fleet's actual rulebook.

Part 4No top tier? Squeeze the middle one with process

The next question comes almost immediately: what if you do not want to pay for the top tier at all, and run Opus as the head instead? Part 2 gives the conditional answer: yes, if you reduce the ambiguity for it first. These are the six levers we use.

Turn judgment into checklists. Things the top tier does unprompted, like checking a claim's source before believing it, get written out as explicit steps the middle tier follows mechanically.
Spend the price gap on iterations. One careful pass from the expensive model trades against draft, self-critique, revise from the cheaper one, and the bundle still costs less.
Run two models from different vendors on the same problem, then let a third judge. Same-model blind spots correlate; vendor diversity is your immune system.
Slice work smaller than feels necessary, with a runnable acceptance check per slice. Shorter, clearer briefs sharpen the middle tier more than they sharpen the top one.
Turn extended thinking on only at the hard nodes. Design and debugging nodes get the full budget; everything else runs fast. You are hand-simulating the top tier's economics.
Put a human at every irreversible gate. The top tier's real edge is situational judgment at decision points. Without it, stand there yourself and let the model run at full speed between the gates.

Honesty checkpoint: the claim that "a mid tier plus tight process beats a top tier working loose" is still a hypothesis, not a measurement. We believe it from daily use, but we have not run a controlled A/B. Our fleet's daily token reports are collecting the numbers now. If the data argues back, we will come back and edit this post with the numbers attached.

Part 5Apply it to your own work

The one rule to remember

Before picking a model, ask how ambiguous is this work, not how hard. Clear work goes to the cheap model, and the savings buy verification rounds. Ambiguous work earns the expensive model, and only for the ambiguous part.

Where to start

Look at everything you gave AI in the past week and sort it into three piles: clearly specified with a checkable result, needs design or debugging, needs a decision on something unsettled.
Move the whole first pile to Sonnet. If quality drops, do not upgrade the model yet. Find where the brief was ambiguous first.
Only the third pile earns the top tier, and if that work is irreversible, put yourself at the button.

One re-sort is usually enough to see the bill come down with no quality drop, because most of pile one never needed the thinking you were paying for.

Per-tier pricing and the subscription-versus-API question live in a separate post (linked below), because prices change faster than principles. This post stands on the principle: pick by the ambiguity of the work, and let process substitute for price.

Sources and references

Claude Fable 5 / Mythos-class announcement: anthropic.com/news/claude-fable-5-mythos-5
The mid-task credit-ceiling incident, the Opus-caught-what-Sonnet-missed case, and the tier rulebook come from our own agent-fleet work logs (July 2026)

Same series: Claude Code: subscription or API · Claude Fable 5 as the head, subagents as the hands · Which local LLM fits which job · Interviewing Claude Fable 5 about AI orchestration

The most expensive model isn't the answer.
The one that fits the work is.