Early July, we let the most expensive model in the house hand-write about two thousand lines of HTML and CSS. It did the job well, every line careful. Then, just before the work was done, its credit ran out mid-air, and a cheaper model had to take over with a long handoff note.
The model was not at fault. It faithfully did what it was told. The fault was in the staffing: we had put the chief architect on bricklaying duty and paid architect rates for a wall any builder could have laid.
That night we rewrote the model-picking rules for our whole agent fleet. This post is that rulebook: all four Claude tiers compared from real use, which work goes to which tier, and how to squeeze above-tier results out of a mid-tier model when the top one is not on the payroll.
Part 1Meet the four tiers
The Claude family now has four tiers we use in real work. Three are familiar names. The fourth is new: Fable 5, the first model of the Claude 5 family, in a new Mythos-class tier that Anthropic positions above Opus (official announcement).
| Tier | Character | Role on our team |
|---|---|---|
| Haiku | Fastest and cheapest, answers straight | Scout unit: file search, pattern sweeps, fan out ten at a time without guilt |
| Sonnet | Price-to-skill balance, follows instructions precisely | Main hands: code to spec, tests, pattern-shaped work |
| Opus | Thinks in layers, sees cross-file connections | The thinker: system design, hard debugging, reviewing others' work |
| Fable 5 | Reads situations, weighs trade-offs, handles ambiguity | The lead: decomposes work, synthesizes across sources, guards decision points |
One thing worth saying plainly before moving on: the difference between the top tier and the one below is a matter of degree, not kind. Fable does nothing Opus categorically cannot. It just misses less often on work that is hard to interpret. Which means the model question is really a question about the nature of your task.
Part 2The one criterion that matters: ambiguity
At first we thought what everyone thinks: the criterion is difficulty. Hard work gets the expensive model, easy work gets the cheap one. After months of running a real fleet, the criterion that actually works turned out to be different: ambiguity, not difficulty.
Work that is hard but clear, say writing a parser against a fully specified grammar with tests to run, Sonnet handles so well you can barely tell it from the expensive tier. The judge of success lives outside the model: tests pass or they do not.
But work that looks easy yet is ambiguous, like "check whether this handoff note can be trusted," is where model tier shows up immediately. You have to read whether the writer knew or guessed, and spot which sentences are claims nobody verified. There is no test suite for that.
The gap between model tiers narrows as the work gets clearer. The tighter the spec, the more runnable the acceptance check, the smaller the slices, the closer the cheap model gets to the expensive one.
We pair it with one secondary criterion: is the action reversible? Work that can be undone can go to whatever tier its ambiguity suggests. But irreversible points, deploys, deletions, anything leaving the building, always get a human or the top tier standing at the gate, whatever tier does the surrounding work.
Part 3The verdict table: which work goes where
This is the actual rulebook our fleet runs on. Where a row cost us something to learn, the lesson is included.
| Work | Use | Why |
|---|---|---|
| File search, pattern hunting, broad information gathering | Haiku, several in parallel | Read-heavy, think-light. Cheap enough per unit to fan out freely |
| Code to a clear spec, tests, boilerplate, pattern-shaped edits | Sonnet | The acceptance judge lives outside the model; paying for extra thinking buys nothing |
| System design, cross-file debugging, reviewing others' code | Opus | Real case from our fleet: Opus caught a transitive dependency bug that Sonnet read straight past. The higher rate pays for itself exactly here |
| Decomposing big work, synthesizing across sources, deciding under ambiguity | Fable 5 | Work where one mistake is expensive downstream: plan wrong at the top and everyone executing the plan is wrong with you |
| High-volume routine work accidentally handed to the expensive tier | Don't | That opening story. The architect laid bricks until the credit ran out |
Notice the middle column never says "best," only "best fit." Never pay reasoning rates for work that doesn't reason is written, in those words, in our fleet's actual rulebook.
Part 4No top tier? Squeeze the middle one with process
The next question comes almost immediately: what if you do not want to pay for the top tier at all, and run Opus as the head instead? Part 2 gives the conditional answer: yes, if you reduce the ambiguity for it first. These are the six levers we use.
- Turn judgment into checklists. Things the top tier does unprompted, like checking a claim's source before believing it, get written out as explicit steps the middle tier follows mechanically.
- Spend the price gap on iterations. One careful pass from the expensive model trades against draft, self-critique, revise from the cheaper one, and the bundle still costs less.
- Run two models from different vendors on the same problem, then let a third judge. Same-model blind spots correlate; vendor diversity is your immune system.
- Slice work smaller than feels necessary, with a runnable acceptance check per slice. Shorter, clearer briefs sharpen the middle tier more than they sharpen the top one.
- Turn extended thinking on only at the hard nodes. Design and debugging nodes get the full budget; everything else runs fast. You are hand-simulating the top tier's economics.
- Put a human at every irreversible gate. The top tier's real edge is situational judgment at decision points. Without it, stand there yourself and let the model run at full speed between the gates.
Honesty checkpoint: the claim that "a mid tier plus tight process beats a top tier working loose" is still a hypothesis, not a measurement. We believe it from daily use, but we have not run a controlled A/B. Our fleet's daily token reports are collecting the numbers now. If the data argues back, we will come back and edit this post with the numbers attached.
Part 5Apply it to your own work
The one rule to remember
Before picking a model, ask how ambiguous is this work, not how hard. Clear work goes to the cheap model, and the savings buy verification rounds. Ambiguous work earns the expensive model, and only for the ambiguous part.
Where to start
- Look at everything you gave AI in the past week and sort it into three piles: clearly specified with a checkable result, needs design or debugging, needs a decision on something unsettled.
- Move the whole first pile to Sonnet. If quality drops, do not upgrade the model yet. Find where the brief was ambiguous first.
- Only the third pile earns the top tier, and if that work is irreversible, put yourself at the button.
One re-sort is usually enough to see the bill come down with no quality drop, because most of pile one never needed the thinking you were paying for.
Per-tier pricing and the subscription-versus-API question live in a separate post (linked below), because prices change faster than principles. This post stands on the principle: pick by the ambiguity of the work, and let process substitute for price.
- Claude Fable 5 / Mythos-class announcement: anthropic.com/news/claude-fable-5-mythos-5
- The mid-task credit-ceiling incident, the Opus-caught-what-Sonnet-missed case, and the tier rulebook come from our own agent-fleet work logs (July 2026)
Same series: Claude Code: subscription or API · Claude Fable 5 as the head, subagents as the hands · Which local LLM fits which job · Interviewing Claude Fable 5 about AI orchestration