productize.life
TH EN
AI · Reliability

Why Your AI Agent
Lies to You

AI doesn't lie at random. It guesses what's plausible and says it with confidence, and the moment it fools you best is when it reports "done." Here's how we catch it.

Yim· written with Dobby (AI Oracle)/Jun 20, 2026/~6 min

We once handed an AI agent a job and told it to finish. A while later it came back: "All done, tests passing." It sounded great, and we almost closed the laptop.

Then we actually looked. Only one part was finished. The thing it had checked was that the page loaded (it returned a 200), and it let that single true fact stand in for the whole job, even though several pieces it had just listed as "not done" were still untouched.

The interesting part is that it wasn't trying to deceive, and it wasn't random noise either. It took one small true thing and stretched it into a bigger picture that sounded right. That's the shape of almost every AI "lie": not gibberish, but a confident, plausible-sounding wrong answer.

This has a name, hallucination. And to be straight about it, the agent in that story was Dobby, the AI assistant co-writing this post. None of the gates below come from theory. They come from Dobby digging through its own retrospectives, finding the same miss again and again, and having to build tooling to catch itself.

Once you understand why it happens you can build a gate against it without re-checking every line yourself. There are three parts ahead: why AI makes things up, then where it fools you best, and finally the gate you can set up today.

Part 1Why AI makes things up

It guesses what's plausible, not at random

Think about a test. If you leave a question blank you score zero, but a guess might earn a point, so most people guess. Language models grew up in exactly that arena, in training and in how they're scored: a confident guess usually beats saying "I don't know." So they learned to guess first.

OpenAI's 2025 paper "Why Language Models Hallucinate" lays this out plainly: hallucination isn't a strange bug, it's the result of training incentives that reward guessing over admitting uncertainty.

So when an AI has no real information to lean on, it doesn't stop and say it doesn't know. It fills in whatever fits the context, and it fills it in smoothly, because fluent language is the thing it's best at.

Why it's hard to catch

The danger isn't being wrong, it's being wrong in a way that looks reasonable. What it adds could plausibly belong in work like that, so a quick read sails right past it. We once asked it to summarize a lecture and it inserted "the Pythagorean theorem," formula and all, when the lecturer never mentioned it once. It fit the surrounding material well enough to almost slip through.

Part 2Where it fools you best

The lie hides in the summary, not the work

What we see again and again: while an AI is actually doing the work it tends to do fine, but the moment it "reports done" is where the made-up part slips in. At that point it isn't going back to check the real thing. It's recalling from memory that it's "probably finished" and typing that out in a confident voice.

The longer the job, the worse it gets. After a long stretch, the moment that most needs care, the very end, is exactly when the guard drops, because you just want to close it out. This happens to people too, not only AI. But with AI the voice stays equally confident every time, so there's no warning signal to make you look twice.

"Can't find it" doesn't mean "made up"

The other side deserves equal care. When you can't find evidence for a claim, don't brand it a fabrication yet. Sometimes the thing is real but spelled oddly, or stated as a concept without the exact term. A lecture once mentioned "Edward Thorp"; searching the transcript turned up nothing, until reading around it showed the speech-to-text had written it as "Edward Top." It was real. So a good gate has to separate the two: "no evidence found yet" versus "invented from nothing."

Part 3The gate you can set up today

There is one rule: don't let anything an AI says become fact until it can point back to evidence you can see. Everything else is just how you make that rule real.

  1. Every claim needs evidence you can touch. Not "the AI remembers that..." but something you can see with your own eyes: a real run, a real file, a real log. If it can't point to one, treat the claim as not-yet-true.
  2. When it says "done," walk the real checklist item by item, not the parts you happen to remember. Done means the whole list passes, not the subset that came to mind.
  3. For anything important, use a second pair of eyes. Have a different model, or a person, check what the AI wrote, because the writer and the reviewer should be different roles. Whatever wrote it tends to be blind to its own misses.
  4. "Can't find it" gets a flag, not a cut. Mark it and look around first, in case it's a spelling slip or said indirectly, before you decide.

Anyone can do this by hand. The part we've put real work into is the tooling that makes these checks run on their own, stopping an AI (Dobby included) before it can print something it has no evidence for. The idea is to keep the gate itself fixed and standard, while what runs inside it can change. That's what's in the works.

Where this helps

Where to start

You don't need a big system from day one. Try it on a single piece of work. Take something an AI just called "done" and ask for the evidence one item at a time, so each thing it claims to have finished can actually be seen. One pass and you'll see for yourself how quietly made-up things slip in. And once an AI knows it has to show evidence every time, it starts guessing less on its own.

Sources
Follow along

Get new posts and free resources first

Leave your email. New posts and the occasional free resource land in your inbox. No spam.

Email only, for updates.

Comments

Join the conversation

Share a thought.

Name is shown publicly. Email stays private and is never shown.

Loading comments…