productize.life
TH EN
Claude Code · Orchestration

Claude Fable 5 as the head,
everything else as hands

We made the most expensive model an orchestrator and handed it a real migration assessment within the hour. Here is what it delivered, where it broke, and when the head should do work itself.

Yim· written with Dobby (AI Oracle)/Jul 3, 2026

One morning we switched our main Claude Code model to Claude Fable 5, the new family Anthropic positions above Opus. It is genuinely smarter, and it comes with a higher price per token and a quota that burns much faster. Using it the way we used Opus, letting it hunt for files, write boilerplate, and fix typos, would be hiring the most expensive brain in the house to walk paperwork between desks.

So we set a new rule before real use: Fable is the "head" only. It plans, decomposes, and synthesizes. All hands-on work belongs to cheaper "hands". Less than an hour later the rule got its first real test: assessing whether a tool our automation depends on every day should move to its new Rust port (the same tool rewritten in Rust).

This post covers how the team is set up, what the first task produced, and exactly where things broke. Every number comes from a real working session on July 3, 2026. None of them are invented.

Part 1Why the most expensive model should only be the "head"

Picture one task you would hand to an AI coding agent, say "assess whether we should move to the new version of this tool". Inside it are several levels of work mixed together. Some of it needs real thinking, like weighing risk and making the call. Some of it just needs care, like checking which machine runs which commands. And some of it is pure labor, like reading a repo end to end and summarizing what is in there.

Run all of that on one model and you pay big-model prices for every level. You also lose something less visible: the lead model's context fills up with detail. Fifty files in, the brain you wanted for decisions is packed with the contents of files it skimmed on the way, with less and less room left to think.

The shape that fits better is two roles. The head thinks, the hands do. The head is the orchestrator: it takes the problem, plans, splits the work, routes each piece to the right hands, then synthesizes the results into an answer. The hands are subagents that do the work in their own separate context and send back only conclusions. The detail in between never flows back to bury the head.

Cost points the same way. A Fable-class model burns quota fast enough to force the question of which work deserves that brain (we broke down per-model pricing in our Claude Code pricing post). Once you are forced to choose, it turns out most of the work never needed the big model at all.

Part 2Claude Code subagents: setting up head and hands

Claude Code ships with a subagents mechanism (official docs). You create short agent files under .claude/agents/ naming each agent, pinning its model, and describing the work it takes. Our team is currently three agents plus one peer from another vendor.

RoleModelWork it getsWhy
deep-reasonerClaude OpusHeavy thinking: architecture design, multi-file debugging, root-cause huntsDeep reasoning without carrying the whole task
fast-workerClaude SonnetMechanical work: boilerplate, writing tests, edits that follow a settled patternFast enough, far cheaper, and this work needs nothing more
fast-searcherClaude HaikuSearch and fact-gathering: find files, find config, walk inventoriesCheapest, and fans out many in parallel
Codex (peer)gpt-5.5 (OpenAI)Long grinding coding work, and second opinionsDifferent vendor = not stuck in the same bias set as the Claude team

The Claude Code community calls this shape the claude orchestrator, or the orchestrator pattern. What decides whether it works is not the number of agents but the rules written for the head. Ours are three lines.

  1. The head never does grunt work. Any search, read-through, or mechanical job gets delegated immediately, even when the head "could just do it". Every token the head burns on this work is quota taken away from thinking.
  2. Show the plan before acting. The head must lay out what goes to whom before dispatching, so a human can see it and object before money flows out.
  3. Never pin the head inside a daemon. Always-on automation runs fine on small or mid models. The expensive model is called per occasion, only when real thinking is needed.

The actual agent files and the full routing rules are being polished into a toolkit you can pick up and use. The skeleton described here is enough to assemble your own.

One small lesson with a price tag before trusting the cross-vendor peer: we tested whether Codex was reachable by sending the word ping and asking for pong back. That single-word answer cost 26,800 tokens, because an agent of this class wakes up with its entire context, not just your question. Even "checking that the tool works" has a price, and it belongs in your cost math.

Part 3The first real task: a migration assessment with 4 agents

The job that came in: a CLI tool (a program driven from the command line) that our automation uses on two machines has a new Rust port. Worth moving? Questions like this are easy to answer badly, because the smart-sounding answer ("Rust is faster, migrate") and the correct answer live in different places.

Split into three views, fired in parallel

The head cut the survey into three pieces with no dependencies between them, then fanned them out to three hands running at once.

The results were more interesting than expected. The part of the tool we actually use is tiny: 5 integration points and roughly 9 commands, out of a much larger feature set. The port's repo brought both good news and worries. The good news: config works with the same files, and the protocol was proven compatible against a real test fixture, not just documentation. The worries: the port was 8 days old with 470 commits, most of them machine-generated code, and one flag our system leans on every day is gone from the port.

The head synthesizes, and refuses to trust reports alone

At this point three reports agreed on "probably migratable, with conditions". But all of it came from reading. Nobody had touched the real binary yet. So the head dispatched a fourth job: a smoke test against the real thing, built so that failure costs zero. Install the new port side by side with the old one under a different name, point it at the same config, and run read-only commands exclusively. The live system is never touched.

The smoke test is where the whole exercise paid off, because it caught what all three read-based reports could not see.

So the verdict was neither "migrate" nor "don't". It was right direction, wrong time. Parked, with explicit conditions for when to come back and retest. The binary and the test procedure are kept ready. All of this ended with zero damage, and the head never read a single repo file itself.

Part 4What broke, and when the head must act itself

A green test that did not mean pass

Before the smoke test we already had a test suite for the health-check system. Against the new port it ran 5/5, all green. Stopping there would have meant concluding "compatible". Then the smoke test against the real binary broke 2 of 3 probes. Why the contradiction? The suite mocks the layer that calls the binary, so it was testing its own logic without ever touching the real thing. Green across the board, proving nothing. We call this false-green.

The defense that actually works is a positive control: before trusting any checker's green, find a case you know must fail and confirm the checker turns red on it. If the case that should break still comes back green, what you are reading is not a test result. It is an illusion.

An orchestrator is not someone banned from touching anything

The "head never does grunt work" rule has a flip side worth watching. While synthesizing the reports, the head found 3 gaps where the reports disagreed. The options: dispatch another round to the hands (wait again, pay again), or run a 30-second grep itself. It chose the grep, and that was the best-value decision of the day. The head's job is knowing what to delegate, and what is cheaper to do itself in half a minute. The line is not "never touch". It is "never trade thinking time for work the hands can do".

The trap we got caught in twice in one day

The last lesson is not technical. It is about the orchestrator's own behavior. Once "delegate" is in your hand, everything starts to look delegatable. That day we got caught twice. First, a bug fixable in two lines that we routed into someone else's queue instead. Second, a bar we invented ourselves, "this post needs 2-3 real worked examples first", when the evidence from one session was already enough. Self-set bars become excuses not to act, dressed up as prudence. If you are about to set up your own orchestrator, expect this trap to ship with the package.

A quiet agent is not a dead agent

A small note that saves real money: the agent reading the repo went quiet long enough that the instinct said kill it and restart. The truth was the work ran deep. A nudge asking for progress, instead of a kill, showed the job was moving, and the result that came back was deeper than the head could have produced itself. Killing a working agent means paying twice to get the answer later.

Part 5Using this on your own work

Where to start

  1. Create your first three agents. A few lines each under .claude/agents/: a deep thinker, a fast worker, a searcher. Pin models per the table in Part 2.
  2. Write rules for the head. At minimum one line: on any search or mechanical work, delegate. Never do it yourself.
  3. Make the first task read-only. An assessment, a survey, an audit. Failure costs zero, which makes it the perfect practice field.
  4. Always require a smoke test against the real thing. Read-based reports are not the answer yet, and never trust a green test until you have seen it turn red.
  5. Log every time the head sneaks work in itself. The first week you will find it doing grunt work more often than you expect. Write it down, adjust the rules.

When to skip all this

Small tasks, single-file tasks, tasks where you already half-know the answer: one mid-tier model working directly is cheaper and faster. Orchestration has a fixed overhead of its own (planning, dispatching, waiting), and it only pays off when the task is big enough for the tiering to earn it back. Just like a single pong at 26,800 tokens taught us.

If one principle sticks, let it be this: pay premium for thinking, pay budget for doing, and never let delegation become the excuse for not acting. Next week this team meets its second task. If new things break, we will come back and tell you.

Sources and references
Follow along

Get new posts and free resources first

Leave your email. New posts and the occasional free resource land in your inbox. No spam.

Email only, for updates.

Comments

Join the conversation

Share a thought.

Name is shown publicly. Email stays private and is never shown.

Loading comments…