How to Fact-Check AI Output (Without Spending All Day Doing It)

AI lies confidently. That's the part nobody warns you about until you've already pasted a fabricated statistic into a client deliverable.

I've shipped enough AI-assisted writing to have built a fact-checking habit. It's not glamorous, but it's the difference between "this is useful" and "this got me fired."

Assume the citations are wrong

When ChatGPT or Claude gives you a URL or a study reference, my default is that it's hallucinated. Not "might be" — is, until proven otherwise. I'd estimate 30-40% of unprompted citations from any frontier model point at something that doesn't exist or doesn't say what the model claims.

The fix is mechanical: copy the URL, paste it into your browser. If it 404s or redirects somewhere weird, the citation is fake and the claim attached to it needs independent verification. If the URL works, do a Cmd+F for the specific quote or statistic. If you can't find the exact wording, the model paraphrased the source, sometimes accurately and sometimes not.

Use a second model as your skeptic

This is the trick that saves me the most time. After I get an answer from one model, I paste it into a different one with: "Read this and tell me which specific factual claims you'd want to verify before publishing. Be specific about what's likely wrong."

A second model won't catch everything, but it's strong at flagging the parts where the first model got too confident. Claude is good at catching ChatGPT's overreach. ChatGPT is good at catching Claude's vagueness. Gemini is decent at flagging Wikipedia-bait that sounds plausible but is actually outdated.

Ask the second model what it would Google. That list is your fact-check checklist.

Search the specific number, not the topic

When you have a stat like "47% of small businesses adopted AI in 2024," don't search "small business AI adoption." Search the literal phrase from the model's output. Wrap it in quotes. If nothing comes back, the number is fabricated or paraphrased badly enough that no source actually says it. If something comes back, read the original.

Most fake stats die at this step. The ones that survive are usually real, but you'll often find the number is from a 2019 survey of 200 people in one country, not a representative finding worth quoting in 2026.

Treat dates and version numbers as suspect

Models confidently state things like "released in November 2024" or "version 3.2 introduced X" with no awareness of what they actually know. I check every specific date and version number against the actual product page or release notes.

This is especially bad for tooling. Models will tell you a CLI flag exists that hasn't shipped yet, or describe an API parameter from documentation they're remembering wrong. For anything code-related, I run it before I publish it.

Build a trust ladder by claim type

Not all claims need the same level of verification. My rough ladder:

Quick check (30 seconds): search for the specific phrase. Used for stats and dates.

Real check (5 minutes): find a primary source. Used for any claim that could embarrass me.

Deep check (15+ minutes): read the original study or documentation. Used for anything I'm putting my name on professionally.

Skip: general concepts and definitions. Models are usually fine here, and verification is overkill.

The mistake people make is checking everything at the deepest level, which is why fact-checking feels exhausting. Most claims need 30 seconds of triangulation, not a research project.

Watch for the confident pivot

The clearest tell that a model is making things up: it pivots from a specific question to a confident generalization. You ask "what's the user count of X tool in 2025?" and the answer is a paragraph about the broader category, then a number that feels precise but isn't sourced.

When you spot this pattern, the number is almost always either guessed or pulled from training data that's two years stale. Treat it as "I don't know" wearing a suit.

The five-minute pre-publish pass

Before anything goes out, I do one final pass focused only on falsifiable claims. I literally highlight every number, name, date, and citation in the draft, then verify each one. If I can't verify it in under a minute, I either find a source, soften the claim, or cut it.

This pass takes five to ten minutes for a 1,000-word piece. It catches roughly one fabrication per draft, sometimes more. Worth it.

The bigger shift, once you've been burned a few times, is to stop treating AI output as a draft you edit and start treating it as a research assistant you supervise. It produces leads. You verify them. The work is still yours.

Next post in this thread: how to use AI without leaking company data.