productize.life
TH EN
AI · Cost Control

The most expensive model isn't the answer.
The one that fits the work is.

The night our most expensive model burned through its credit mid-task taught us the real question. It is not "which model is smartest" but "how much thinking does this work actually need." Here is the picking rule we run our whole fleet on.

Yim· written with Dobby (AI Oracle)/Jul 5, 2026

Early July, we let the most expensive model in the house hand-write about two thousand lines of HTML and CSS. It did the job well, every line careful. Then, just before the work was done, its credit ran out mid-air, and a cheaper model had to take over with a long handoff note.

The model was not at fault. It faithfully did what it was told. The fault was in the staffing: we had put the chief architect on bricklaying duty and paid architect rates for a wall any builder could have laid.

That night we rewrote the model-picking rules for our whole agent fleet. This post is that rulebook: all four Claude tiers compared from real use, which work goes to which tier, and how to squeeze above-tier results out of a mid-tier model when the top one is not on the payroll.

Part 1Meet the four tiers

The Claude family now has four tiers we use in real work. Three are familiar names. The fourth is new: Fable 5, the first model of the Claude 5 family, in a new Mythos-class tier that Anthropic positions above Opus (official announcement).

TierCharacterRole on our team
HaikuFastest and cheapest, answers straightScout unit: file search, pattern sweeps, fan out ten at a time without guilt
SonnetPrice-to-skill balance, follows instructions preciselyMain hands: code to spec, tests, pattern-shaped work
OpusThinks in layers, sees cross-file connectionsThe thinker: system design, hard debugging, reviewing others' work
Fable 5Reads situations, weighs trade-offs, handles ambiguityThe lead: decomposes work, synthesizes across sources, guards decision points

One thing worth saying plainly before moving on: the difference between the top tier and the one below is a matter of degree, not kind. Fable does nothing Opus categorically cannot. It just misses less often on work that is hard to interpret. Which means the model question is really a question about the nature of your task.

Part 2The one criterion that matters: ambiguity

At first we thought what everyone thinks: the criterion is difficulty. Hard work gets the expensive model, easy work gets the cheap one. After months of running a real fleet, the criterion that actually works turned out to be different: ambiguity, not difficulty.

Work that is hard but clear, say writing a parser against a fully specified grammar with tests to run, Sonnet handles so well you can barely tell it from the expensive tier. The judge of success lives outside the model: tests pass or they do not.

But work that looks easy yet is ambiguous, like "check whether this handoff note can be trusted," is where model tier shows up immediately. You have to read whether the writer knew or guessed, and spot which sentences are claims nobody verified. There is no test suite for that.

The gap between model tiers narrows as the work gets clearer. The tighter the spec, the more runnable the acceptance check, the smaller the slices, the closer the cheap model gets to the expensive one.

We pair it with one secondary criterion: is the action reversible? Work that can be undone can go to whatever tier its ambiguity suggests. But irreversible points, deploys, deletions, anything leaving the building, always get a human or the top tier standing at the gate, whatever tier does the surrounding work.

Part 3The verdict table: which work goes where

This is the actual rulebook our fleet runs on. Where a row cost us something to learn, the lesson is included.

WorkUseWhy
File search, pattern hunting, broad information gatheringHaiku, several in parallelRead-heavy, think-light. Cheap enough per unit to fan out freely
Code to a clear spec, tests, boilerplate, pattern-shaped editsSonnetThe acceptance judge lives outside the model; paying for extra thinking buys nothing
System design, cross-file debugging, reviewing others' codeOpusReal case from our fleet: Opus caught a transitive dependency bug that Sonnet read straight past. The higher rate pays for itself exactly here
Decomposing big work, synthesizing across sources, deciding under ambiguityFable 5Work where one mistake is expensive downstream: plan wrong at the top and everyone executing the plan is wrong with you
High-volume routine work accidentally handed to the expensive tierDon'tThat opening story. The architect laid bricks until the credit ran out

Notice the middle column never says "best," only "best fit." Never pay reasoning rates for work that doesn't reason is written, in those words, in our fleet's actual rulebook.

Part 4No top tier? Squeeze the middle one with process

The next question comes almost immediately: what if you do not want to pay for the top tier at all, and run Opus as the head instead? Part 2 gives the conditional answer: yes, if you reduce the ambiguity for it first. These are the six levers we use.

  1. Turn judgment into checklists. Things the top tier does unprompted, like checking a claim's source before believing it, get written out as explicit steps the middle tier follows mechanically.
  2. Spend the price gap on iterations. One careful pass from the expensive model trades against draft, self-critique, revise from the cheaper one, and the bundle still costs less.
  3. Run two models from different vendors on the same problem, then let a third judge. Same-model blind spots correlate; vendor diversity is your immune system.
  4. Slice work smaller than feels necessary, with a runnable acceptance check per slice. Shorter, clearer briefs sharpen the middle tier more than they sharpen the top one.
  5. Turn extended thinking on only at the hard nodes. Design and debugging nodes get the full budget; everything else runs fast. You are hand-simulating the top tier's economics.
  6. Put a human at every irreversible gate. The top tier's real edge is situational judgment at decision points. Without it, stand there yourself and let the model run at full speed between the gates.

Honesty checkpoint: the claim that "a mid tier plus tight process beats a top tier working loose" is still a hypothesis, not a measurement. We believe it from daily use, but we have not run a controlled A/B. Our fleet's daily token reports are collecting the numbers now. If the data argues back, we will come back and edit this post with the numbers attached.

Part 5Apply it to your own work

The one rule to remember

Before picking a model, ask how ambiguous is this work, not how hard. Clear work goes to the cheap model, and the savings buy verification rounds. Ambiguous work earns the expensive model, and only for the ambiguous part.

Where to start

  1. Look at everything you gave AI in the past week and sort it into three piles: clearly specified with a checkable result, needs design or debugging, needs a decision on something unsettled.
  2. Move the whole first pile to Sonnet. If quality drops, do not upgrade the model yet. Find where the brief was ambiguous first.
  3. Only the third pile earns the top tier, and if that work is irreversible, put yourself at the button.

One re-sort is usually enough to see the bill come down with no quality drop, because most of pile one never needed the thinking you were paying for.

Per-tier pricing and the subscription-versus-API question live in a separate post (linked below), because prices change faster than principles. This post stands on the principle: pick by the ambiguity of the work, and let process substitute for price.

Sources and references

Same series: Claude Code: subscription or API · Claude Fable 5 as the head, subagents as the hands · Which local LLM fits which job · Interviewing Claude Fable 5 about AI orchestration

Follow along

Get new posts and free resources first

Leave your email. New posts and the occasional free resource land in your inbox. No spam.

Email only, for updates.

Comments

Join the conversation

Share a thought.

Name is shown publicly. Email stays private and is never shown.

Loading comments…