0:00
/
0:00
0:00
/
0:00
Preview

Smart people get fooled by AI first — because they can rationalize anything. (Self-Audit Framework)

Even the smartest people can fall into the AI overconfidence trap. And that's risky for us, our jobs, and our teams. Here's how to make sure you stay sane (and yes, there are prompts).

“Not suffering from psychosis.”

That’s what David Budden wrote on X, responding to the pile-on after he announced he’d essentially cracked one of mathematics’ seven hardest unsolved problems — using ChatGPT and formal verification tooling.

I admint I thought to myself, “If you’re in a situation where you have to tweet that to the world, you’re already in a pretty bad place.”

Budden has a PhD from Melbourne, postdocs at MIT and Harvard, six years at DeepMind rising to Director of Engineering, and a serious publication record. He is not a crank. And yet he’s put $45,000 (across three public bets) on an outrageous claim: that he can resolve not one but two Clay Millennium Prize problems with AI assistance.

The money is split across three public bets: $10,000 against Marcus Hutter and $10,000 against Isaac King on Navier-Stokes, plus $25,000 against University of Toronto mathematician Daniel Litt on the Hodge Conjecture. He announced this as his new company PingYou came out of stealth — which has people debating whether this is sincere conviction, attention-seeking, or both.

The mathematical community’s response has been swift and brutal. The critique isn’t just “wrong” — it’s that his Lean code may formalize a trivial or different version of Navier-Stokes, not the actual Clay problem. Basically, this math problem is so complicated, he may claim he solved it while solving the easy-mode version instead.

One explicit failure mode listed in the prediction markets: “the Lean proof could compile while proving a different or weaker statement than the Clay Millennium problem.” The public resolution criteria are measured in months, not vibes — one market resolves on whether he’s actually solved Navier-Stokes by end of January 2026. Current odds: single digits. And I think that’s high.

Budden’s story is one example of a debate spreading faster than the math debate: the conversation around the term “LLM psychosis.” Yes, there’s no settled definition, but online, people are using it as shorthand for a real workflow failure — when an AI system that’s good at producing plausible explanations pushes users into overconfident acceptance of outputs they can’t audit. That’s a fancy way of saying: when AI convinces you you’re right about something you cannot know.

I’m not using “psychosis” clinically here. I’m pointing at a collapse in some people’s ability to know the edges of their competence because AI fools them.

The real question isn’t whether Budden is right. It’s what happened to his ability to evaluate his own work. Especially as a founder, that’s a question relevant for his whole team at this point. And it’s increasingly a question we should be asking of ourselves, our leaders, and our teams.

Here’s what’s inside:

  • Why LLMs make automation bias worse than traditional systems — research shows AI explanations increase trust even when completely wrong, and the most vulnerable people aren’t novices

  • Three warning signs this pattern is developing — confirmatory prompting disguised as verification, operating beyond your evaluation capacity, and the “me and the AI versus everyone else” dynamic

  • What to do about it — the specific questions to ask before trusting any AI-assisted conclusion, plus a full adversarial prompt framework for ongoing work

  • What to watch for in your organization — behavioral signals that someone’s judgment is being compromised by AI validation loops

  • Prompts to check your work — these 10 prompts are designed to act as a detailed set of screens that check your work with a helpfully adversarial grounding. Basically, they’re designed to give you the tools to have the LLMs do professional quality grading on your own work so you don’t fool yourself!

    • Adversarial Mini-Check — quick pre-ship attack and calibration checklist

    • Before You Start — sets adversarial rules and evidence discipline

    • Audit Boundary Check — identify needed verification skills and reviewers

    • Disconfirmation Pass — structured attempt to break the conclusion

    • Reality Check — anticipate expert objections without fake consensus

    • Confidence Calibration — score certainty; define tests to raise confidence

    • Final Gate — commit/no-commit decision with stop conditions

    • Project Ground Rules — enforce correctness-first behavior across sessions

    • 2-Minute Reality Check — fast grounding, falsifiability, next action

    • Full Assessment — deep anti-spiral review before high-stakes moves

In 2026, a lot of smart people armed with LLMs and formal tools will ship convincing-looking but wrong work faster than many of our teams can audit it. That’s the risk. What I’m putting together here is your early warning radar so you see it coming!

Let’s dig in.

Subscribers get all posts like these!

Listen to this episode with a 7-day free trial

Subscribe to Nate’s Substack to listen to this post and get 7 days of free access to the full post archives.