One day we wanted to cut the cost of the AI team that writes code for us. The model we were using was good but expensive, and we wanted to try swapping in a cheaper one just for the coding work. Normally that means tearing down half the system and rebuilding it. This time it was one config line. The agent was the same agent: same identity, same work queue, we only swapped the engine inside.
That wasn't luck. It worked because from the start we kept three things separate: the agent's identity, the engine that runs it, and the thing that routes the work. Most people see these as one lump. And once they're one lump, moving any one of them drags the others down with it.
Part 1The trap of marrying an agent to one model
When people say "I use this agent," they usually mean one model, plus the way it works, plus the jobs it takes, all fused into a single thing.
The problem shows up the moment you want to change something. Want to try a newer, cheaper model? Now you're re-tuning all of its behavior. Want it to handle several jobs at once? You're back poking at the model itself. This is the quiet form of vendor lock-in: you've tied yourself to one model vendor without noticing. The day they raise prices or a better one appears, you can't move.
Part 2Three layers to keep separate
The way out is to see that one agent is really three layers, each doing a different job.
The first is the soul: who this agent is, what principles it holds, how it talks, what it remembers. The soul is permanent. It isn't tied to any one model.
The second is the engine: the model that actually does the thinking. It can be Claude, Codex, GLM, or a local model. This layer is swappable; you change it to fit the job and the budget.
The third is orchestration: the thing that takes work in, breaks it into smaller tasks, and routes each one to the right engine. This layer is the glue that ties the soul to the engine without fusing the two together.
| Layer | What it is | How often it changes |
|---|---|---|
| Soul | Who the agent is: principles, character, memory | Almost never |
| Engine | The model that runs the thinking (Claude / Codex / GLM / local) | To fit the job and budget |
| Orchestration | Takes work, splits it, routes it to the right engine | As the workflow changes |
Part 3What separating buys you
Once the three layers are separate, the engine becomes something you pick per job. Work that needs strong reasoning in a particular language uses the model that's good at it; high-volume, repetitive coding gets swapped to a cheaper one. You don't pay premium prices for jobs that don't need them.
More important than the money: we're not married to any model vendor. The day a better or cheaper model ships, you just swap the engine; the soul and the work queue stay intact. That's what lets a system outlive a model generation without being rebuilt every time the field moves.
Part 4The exception that proves the rule
But not every layer swaps freely. Some jobs are pinned to a specific engine.
Here's one we hit ourselves. Putting an agent into a chat room as a live presence has to run on the one engine that supports that mode; you can't swap it for another. The heavy background reasoning, on the other hand, can swap engines freely. The lesson is to know which layer can swap and which is pinned to an engine, then order them right: bring up what's pinned first, let what's swappable follow.
This exception doesn't break the principle; it confirms it. Separating the layers is exactly what lets you see clearly what's pinned to what, which beats lumping everything together and then wondering why changing one thing broke another.
Part 5The structure is open, the assembly is ours
The three-layer idea is open; anyone can adopt it. What's our design is the rules that say which soul runs on which engine, which kind of work goes to whom, and how to swap an engine without losing the soul. The interface is fixed, the engine inside can change. It isn't holding anything back. It's saying where the boundary is.
Part 6Start by separating the three layers
If you're about to build an agent that runs on its own, don't start with "which model should I use." Start by separating the three layers: what is its soul, which engines can run it, and what routes the work. Once the boundaries are clear, changing models is just swapping an engine, not tearing down the house.
Written from real work, not theory, the actual fleet architecture we run with swappable engines.
- Now reading: separating an AI agent's soul, engine, and orchestrator
- Running many coders at once one AI writing code is easy, ten at once is the real job
- Let a different engine review the work using Codex to review code Claude wrote
- Where the soul lives AGENTS.md and SOUL.md, sharing knowledge across a fleet