0:00
/
0:00
/
Preview

Opus 4.7 is smarter, more literal, and quietly more expensive. Those are three different problems.

The migration is harder than the version number suggests. Here's what I learned, what it costs, and how to fix it

Opus 4.7 does exactly what you ask it to do. Which turns out to be a problem if your workflow depended on the old model guessing what you actually meant.

I’ve spent four days testing this release — migration benchmarks against GPT-5.4, a real afternoon inside Claude Design, and a steady stream of production work — and what I found is that the backlash and the praise are both describing real things. The model is measurably stronger on the hardest work. It’s also more combative, more literal, and quietly more expensive per unit of output even though the sticker price didn’t change. Those aren’t side effects of the same decision. They’re separate engineering choices that shipped in the same release, and they have separate fixes. The people treating this as one story are going to make the wrong call on migration. Some of you will overpay for work that got cheaper. Others will downgrade away from the one model that actually got the hard stuff right.

Here’s what’s inside:

  • The capability gains are real — but they’re not uniform. Persistence, coding, vision, and a knowledge-work win that’s getting buried under the backlash, plus the web research and terminal regressions worth routing around.

  • The inference 4.6 was doing for free is gone. The model got more literal, and the fix is clearer prompts, not longer ones.

  • Your bill went up even though the price didn’t change. A tokenizer tax, adaptive thinking, and breaking API changes that compound in ways the headline pricing hides.

  • Claude Design and the $42 afternoon. A design tool that turns your brand into machine-readable agent instructions — and what the correction loop reveals about where Anthropic actually is versus where the valuation says they should be.

  • Three prompts to migrate without guessing. A pre-flight check that flags what breaks, a cost estimator that quantifies the tokenizer tax on your specific usage, and a peer review workflow builder for the reliability layer you should already have.

I’ll start with what actually shipped and why, then get into the parts that change how you work.

Subscribers get all posts like these!

Listen to this episode with a 7-day free trial

Subscribe to Nate’s Substack to listen to this post and get 7 days of free access to the full post archives.