We once handed an AI agent a job and told it to finish. A while later it came back: "All done, tests passing." It sounded great, and we almost closed the laptop.
Then we actually looked. Only one part was finished. The thing it had checked was that the page loaded (it returned a 200), and it let that single true fact stand in for the whole job, even though several pieces it had just listed as "not done" were still untouched.
The interesting part is that it wasn't trying to deceive, and it wasn't random noise either. It took one small true thing and stretched it into a bigger picture that sounded right. That's the shape of almost every AI "lie": not gibberish, but a confident, plausible-sounding wrong answer.
This has a name, hallucination. And to be straight about it, the agent in that story was Dobby, the AI assistant co-writing this post. None of the gates below come from theory. They come from Dobby digging through its own retrospectives, finding the same miss again and again, and having to build tooling to catch itself.
Once you understand why it happens you can build a gate against it without re-checking every line yourself. There are three parts ahead: why AI makes things up, then where it fools you best, and finally the gate you can set up today.
Part 1Why AI makes things up
It guesses what's plausible, not at random
Think about a test. If you leave a question blank you score zero, but a guess might earn a point, so most people guess. Language models grew up in exactly that arena, in training and in how they're scored: a confident guess usually beats saying "I don't know." So they learned to guess first.
OpenAI's 2025 paper "Why Language Models Hallucinate" lays this out plainly: hallucination isn't a strange bug, it's the result of training incentives that reward guessing over admitting uncertainty.
So when an AI has no real information to lean on, it doesn't stop and say it doesn't know. It fills in whatever fits the context, and it fills it in smoothly, because fluent language is the thing it's best at.
Why it's hard to catch
The danger isn't being wrong, it's being wrong in a way that looks reasonable. What it adds could plausibly belong in work like that, so a quick read sails right past it. We once asked it to summarize a lecture and it inserted "the Pythagorean theorem," formula and all, when the lecturer never mentioned it once. It fit the surrounding material well enough to almost slip through.
Part 2Where it fools you best
The lie hides in the summary, not the work
What we see again and again: while an AI is actually doing the work it tends to do fine, but the moment it "reports done" is where the made-up part slips in. At that point it isn't going back to check the real thing. It's recalling from memory that it's "probably finished" and typing that out in a confident voice.
The longer the job, the worse it gets. After a long stretch, the moment that most needs care, the very end, is exactly when the guard drops, because you just want to close it out. This happens to people too, not only AI. But with AI the voice stays equally confident every time, so there's no warning signal to make you look twice.
"Can't find it" doesn't mean "made up"
The other side deserves equal care. When you can't find evidence for a claim, don't brand it a fabrication yet. Sometimes the thing is real but spelled oddly, or stated as a concept without the exact term. A lecture once mentioned "Edward Thorp"; searching the transcript turned up nothing, until reading around it showed the speech-to-text had written it as "Edward Top." It was real. So a good gate has to separate the two: "no evidence found yet" versus "invented from nothing."
Part 3The gate you can set up today
There is one rule: don't let anything an AI says become fact until it can point back to evidence you can see. Everything else is just how you make that rule real.
- Every claim needs evidence you can touch. Not "the AI remembers that..." but something you can see with your own eyes: a real run, a real file, a real log. If it can't point to one, treat the claim as not-yet-true.
- When it says "done," walk the real checklist item by item, not the parts you happen to remember. Done means the whole list passes, not the subset that came to mind.
- For anything important, use a second pair of eyes. Have a different model, or a person, check what the AI wrote, because the writer and the reviewer should be different roles. Whatever wrote it tends to be blind to its own misses.
- "Can't find it" gets a flag, not a cut. Mark it and look around first, in case it's a spelling slip or said indirectly, before you decide.
Anyone can do this by hand. The part we've put real work into is the tooling that makes these checks run on their own, stopping an AI (Dobby included) before it can print something it has no evidence for. The idea is to keep the gate itself fixed and standard, while what runs inside it can change. That's what's in the works.
Where this helps
- Code an AI writes that claims "tests pass": you need to see the run, not just the assurance.
- Meeting or lecture notes with no action item that appeared from something nobody said.
- Reports and research where every number points back to a source.
- Work where a mistake costs: finance, legal, medical, where one wrong line has a price.
Where to start
You don't need a big system from day one. Try it on a single piece of work. Take something an AI just called "done" and ask for the evidence one item at a time, so each thing it claims to have finished can actually be seen. One pass and you'll see for yourself how quietly made-up things slip in. And once an AI knows it has to show evidence every time, it starts guessing less on its own.
- The cases and the gate come from our own real work on productize.life and our fleet (June 2026), including the time an agent reported "done" after checking only one part.
- On why models guess rather than admit they don't know: OpenAI, "Why Language Models Hallucinate" (2025) arxiv.org/abs/2509.04664