The Complete ChatGPT-5 Review: 5 Real-World Tests and the Playbook to Use It Right

Playback speed

Share post at current time

0:00

Playback speed

0:00

Preview

The Complete ChatGPT-5 Review: 5 Real-World Tests and the Playbook to Use It Right

Three full deep-dives in one: what actually shipped, how GPT-5 scored under pressure, and the exact workflows that work — and the ones that burn you

Nate

Aug 08, 2025

∙ Paid

ChatGPT-5 is finally here.

I’m writing this trying to sum up my complicated feelings after staring at dozens of documents and chats, coding exercises, and basically the reaction of the entire online population of the planet.

It’s a lot. But here’s my through line: What’s a no-hype, grounded take on how to actually use this model in useful ways? How is it different from previous models? What surprises me? What is critical to know about this model, especially vs. other models out there? How does it do at useful work?

When I sit down to write about a model like GPT-5, I’m not chasing hype and I’m not hunting for a clever “gotcha.” I’m asking one thing: does this system make me better at my real work without lowering my standards? Can it take the kind of messy, high-stakes problems I actually face and return something I’d sign my name to — even if the person on the other end doesn’t like me and goes looking for holes?

That’s the core of this review. No theater. No hero demos. Just GPT-5 in the only environments that matter: hostile data, unforgiving constraints, outputs that must survive an audit.

And because this model is going to land in so many hands, so quickly, I’m doing something I almost never do: giving you three full articles in one. Three complete lenses on GPT-5, stitched together because I believe the timing matters and the stakes are high.

First — What actually shipped. I strip the marketing down to the bones: the router deciding when to answer fast and when to think; the new controls over reasoning effort and verbosity; the context window that swallows whole corpuses; the real gains in coding, health, factuality, and long-context retrieval. This is the factual baseline you need before you can even start to judge whether the thing works for you.

Second — Launch-day stress tests. I built five deliberately brutal challenges to force GPT-5 to show its work:

a gnarly three-CSV reconciliation with duplicates, ghosts, cycles, mixed currencies, and even a SQL injection
a configurable Japan travel app built from scratch
an Apollo 13 Gantt chart exercise
a business writing test framed around the Amazon PRFAQ
a multimodal reading and critique test on real handwriting (two different hands in fact)

Every one of these tasks has a right and wrong answer — and “sounds plausible” is not enough. I required assumptions, constraints, computed tables, and discrepancies on every run. That’s how you learn whether a model is actually reasoning or just narrating.

Third — Daily-driver reality. Once the feature list is clear and the torture tests are done, I drop GPT-5 into the day-to-day patterns that matter for most knowledge workers: writing, planning, light analysis, research synthesis, learning. Where it’s great, I’ll tell you why. Where it still needs scaffolding, I’ll show you exactly what to add. And for the first time, I’m publishing the prompting patterns I’ve been refining for years — contract-first structures, guardrails for hallucination pressure, quick recipes you can steal tomorrow — because a reasoning-class default model changes the baseline for what “good” prompting looks like.

This is what I do around here: deep, grounded in real work data, and unflinching about where the cracks are. The stakes are bigger than “is it cool.” A wrong FX conversion can sink a forecast. A missed duplicate can create phantom headcount. A confident summary built on sand can survive three meetings before anyone notices. If GPT-5 is going to live inside your planning, reporting, analysis, or build flows, you need a standard: prove it. Show the path from input to answer. Expose uncertainty. Make it easy to check the math.

That’s why this piece exists, and why it’s three-in-one. We are on the verge of millions of people getting a reasoning-class model by default for the first time. Most will assume that alone will fix their outcomes. It won’t. Capability matters, but so do prompts, constraints, and process. The gap is shifting from “which model” to “which workflow,” and the people who win will be the ones who can route well, demand proof, and ship.

If you want the reality — where GPT-5 is genuinely excellent, where it still trips, and exactly how to use it without getting burned — read on. No hype. No hedging. Just the work. And all three angles of it, in one place.

Listen to this episode with a 7-day free trial

Subscribe to Nate’s Substack to listen to this post and get 7 days of free access to the full post archives.