This started once a single AI coder was working smoothly. Hand it a small task and it returns code you can actually use. Seeing that, the natural next move is to give it a whole batch of work, several tasks at once, like having a small team inside your machine.
But the moment you try to run several the old way, by opening a chat and typing instructions one agent at a time, it falls apart fast. This one finishes and sits idle waiting for the next instruction. That one edits the same file as another and they collide. You become the bottleneck, staring at ten windows and losing track of who did what.
That is where it becomes clear: the hard part is not "can AI write code." That already works. The hard part is getting several to run at once without colliding, without drifting, and in a way you can verify. Put differently, the demo is easy. The real product is the management system around the agent, and that is the part people skip.
The story runs in order: first why chatting one agent at a time does not scale, then the Kanban swarm pattern that changes how you run them, then the quality gate before code lands for real. All of it is a pattern we tried on our own work, not theory.
Part 1Why chatting one agent at a time does not scale
With a single AI coder, opening a chat works well, because you hold all the context in your head. You know what you asked and how far it got. But add three or four and the thing that breaks is not the AI. It is how you are managing them.
Three problems show up at once. First, you have to feed every agent its next task yourself, so each one sits idle the moment it finishes, waiting for you to type. Second, when several edit the same code in the same place, the work overwrites itself: what one just wrote, another erases. Third, you become the bottleneck, because every decision routes through you.
Here is the key point: chatting is management by keeping everything in one person's head. That does not scale, because one head holds only so much. The fix is not typing faster or opening more windows. It is moving the context out of your head and into a system every agent can read together. That is where the Kanban board comes in.
Part 2The Kanban swarm pattern
Instead of talking to agents one at a time, you have the whole team work around a single board, the same Kanban that human teams use: columns for to-do, doing, done. The only difference is that the one picking up cards is an agent. The tool we actually use to run this board is Hermes, an open-source tool built for agents working as a team around one board.
One goal, split into cards
It starts with a lead taking one big goal, say "add a login system." Its first job is not to start coding but to split the goal into smaller cards, each small enough to finish on its own. Those cards then sit in the to-do column on the board.
That board is the context you moved out of your head. Who is doing what, and how far along, is all on the board. You no longer hold it yourself.
Hand out cards, each worker in its own space
With cards in the to-do column, workers pick them up one at a time. The thing that makes parallel work possible without collisions is that each worker runs in its own worktree: git checks out a separate copy of the code in a separate folder. Worker A can edit the same file as worker B, because they are different copies. They do not see each other and do not overwrite each other.
This is what makes "parallel" real. If every agent edited one folder, adding agents would only mean more collisions. But with separate worktrees, adding a worker just adds a copy, and the work runs in parallel for real.
The one rule that keeps a swarm from turning into chaos is that each worker needs its own workspace, isolated from the others. Isolation is not the worker's job. It is the queue's job.
What we actually ran end to end: from one card, spin up a worktree for a worker, get real code out in tens of seconds, and the card moves itself to done. No one feeding it the next step.
The engine swaps; the agent stays the same
One boundary makes the whole system flexible: an agent's identity is separate from the engine driving it. Right now the workers run on gpt-5.5 (via Codex), while we sit and drive from Claude Code. What ties the two sides together is that the skills Hermes workers use and the skills we use in Claude Code are one and the same set, from a single source. When a new model shows up that is cheaper or stronger, you swap it into the same role. Hermes gives you the board and the engine-swap as a base; the glue we add on top, the queueing, the worktree isolation, and wiring it to our one shared skill set, is the part we are still building.
The point is to design for a fixed interface with a swappable engine inside, so you are not locked to one vendor. A better model just slots into the existing role.
Part 3The quality gate before a human merges
Letting agents write code in parallel quickly is good, but fast alone is not enough. If what comes out cannot be trusted, you have only sped up producing bugs. So the quality gate matters as much as the swarm itself.
The reviewer should be a different engine from the writer
The rule we hold is that the reviewer should not be the same engine as the writer, because a model that writes a bug with one kind of logic tends to miss its own bug with that same logic. A different engine sees what the first one cannot. What we actually run today: a worker finishes, opens a PR (a request to merge the code), and a second-engine review sits in front of the merge before it reaches a human. Making the writer and reviewer different engines on every single task is the direction we are still completing.
A human always presses merge
The last gate before code lands for real is a person. The system can split, write, and review on its own, but pressing merge into the main code stays with the human, because that is the step that is hard to undo: once it is in, others pull from it immediately. This is the same boundary covered in when to let AI act on its own and when a human decides. Reversible work the system runs; hard-to-undo work a human gates. Merge sits on the human side.
A good system does not mean the human drops out of the loop. It means the human stands at the single most important point: right before something lands in a way you cannot take back.
Part 4Putting it to work
Three principles you can take straight out
- Move context out of your head and onto a board. If you have to remember who did what, you have not scaled yet. Put work status where every agent can read it together.
- Give every worker its own workspace. Parallel works because each has its own copy, not because they agree not to collide. Isolation is the queue's job, not the worker's discipline.
- Reviewer on a different engine, human presses merge. Speed comes from the agents. Trust comes from a review that sees a different angle, and a human standing at the hard-to-undo step.
Where to start
You do not need to build the full system first. Start from what you already have.
- Take work you used to hand an AI one task at a time, and split it into smaller cards by hand. See which ones can truly run in parallel.
- Before running two agents at once, put each in its own git worktree. Check that they really stop overwriting each other.
- When work finishes, do not merge it into the main code yourself yet. Have another AI (a different engine if you can) read it over first, then you press merge.
One pass is enough to see it for yourself: the thing that changes the game is not "smarter AI." It is the board and the isolation that let several run together without you holding everything in your head.
The real machinery that makes all of this run on its own, from the queue that hands out cards and enforces time limits, to swapping engines by role, to keeping worktrees from mixing, we are saving for the next post in this series.
- The fleet architecture and the entire Hermes setup in this article we learned from a workshop by Khun Nat, the creator of the Oracle system. Find his community here: Khun Nat's Oracle group
- Hermes is an open-source project, nousresearch/hermes-agent
- This post: the Kanban swarm pattern (you are here)
- Next: an agent's identity is not its engine, swapping models without losing the agent
- Next: isolation is the orchestrator's job, the story from when worktrees mixed
- Read the human-AI boundary in when AI acts on its own and when a human decides
- Choosing the right model for the job: why the best model is often the wrong one