0:00
/
0:00
0:00
/
0:00
Preview

Here's How I Pick the Right AI for Jobs that Matter: My 5 Prompts + a ChatGPT 5.2 vs. Claude Opus 4.5 vs. Gemini 3 Deep Dive

Models have gotten 10x more detailed this year, and they are at least 10x harder to evaluate for real work. Here's my secret sauce for evaluating models for work that matters plus 5 prompts to help!

Video game graphics took decades to evolve from 8-bit to 4K. AI models just made a similar jump in about a year.

We’re still squinting at them like they’re 8-bit.

The differences between GPT-5.2, Claude Opus 4.5, and Gemini 3 are real and meaningful—but they don’t show up in benchmark charts or Twitter arguments about “which model is best.” Those tools are too low-resolution. They’re like evaluating 4K graphics on a CRT monitor.

Perhaps it was relevant to ask which model was best overall a year or two ago. But I don’t think it’s a useful question now. And I think we need to stop asking it.

In a world where LLMs are literally advancing science, we should be asking much more detailed questions to evaluate real work.

Because, after all, the real work we do IS detailed. That texture matters. Which model can read tally marks? Which can read a 10,000 row spreadsheet? Which can code for 10 hours in a token efficient way? Which one is ok with contradictory and confusing data? Which one produces the powerpoint most compatible with my style?

There’s an essay by John Salvatier called “Reality Has a Surprising Amount of Detail” that’s become my lens for this. He tells the story of building basement stairs—something that seems simple until you discover that lumber warps after it’s cut, screws pull brackets off-angle, and getting every step aligned requires techniques you didn’t know existed. The details aren’t incidental. They’re where all the real problems live.

I built stairs with my granddad, so this hits for me.

Model capabilities have become similarly detailed. And I think we’re busy trying to simplify them too much. Maybe because we’ve been missing the lens to see the detail.

Which is too bad, because the detail is what’s going to help us get real work done!

So this is what I’ve got for you: a full breakdown in detail of ChatGPT 5.2 vs. Opus 4.5 vs. Gemini 3, plus some prompts I use to start to tease out what’s in my head and what I need to observe—the details—for really important work.

Plus there’s more:

  • Why the “best model” question dissolves: The discourse is too coarse to capture what’s actually different, and the details that matter for your work aren’t the same as the details that matter for mine.

  • Simple wins as a way of seeing: How to get granular enough that you notice what benchmarks miss—and why the filter is “work you care about being good.”

  • The five prompts that make detail visible:

    • The Filter — Identify which of your recurring tasks actually warrant careful model selection (most don’t).

    • The Preference Interview — Surface the standards you’ve been applying unconsciously, so you have something concrete to test against.

    • The Detail Map — See the hidden complexity in work that seems simple—where the real failure modes live.

    • The Comparison Setup — Design a clean head-to-head test that reveals real differences, not surface variation.

    • The Observation Log — Capture what you actually noticed before the details fade into vague impressions.

  • Field notes on GPT-5.2, Opus 4.5, and Gemini 3: Not rankings, but observations about where I’ve seen surprising detail in how these models actually behave.

  • The coding deep-dive: Where practitioners have gotten most specific about model differences—and what their observations reveal about how to see capability surfaces clearly.

  • Finding your own detail: The real question isn’t which model is best. It’s where you do work that you care about enough that the detail matters.

Remember, the goal isn’t to crown a winner. It’s to remember that models have a surprising amount of detail, to build the kind of intuition that only comes from direct contact with reality—and to give you a set of tools for seeing detail you might otherwise miss. Get out the magnifying glass!

Subscribers get all these newsletters!

Listen to this episode with a 7-day free trial

Subscribe to Nate’s Substack to listen to this post and get 7 days of free access to the full post archives.