A Complete Guide to Claude 3.7 with Code Comparison Across 7 Major AI Models

Claude 3.7 came out yesterday. Check out how Claude compares to other LLM coding models like Grok 3, o3-mini-high, o1 Pro, DeepSeek, and Gemini 2.0. Coding samples included!

Feb 26, 2025

∙ Paid

Here it is! A complete guide to Claude 3.7, including a code comparison across all the major coding models. I am to make this article a complete introduction to Claude 3.7 focused on its strength: coding. I know how it scores on the benchmarks, but I’m really interested in how the code quality actually compares to other major AI models on a real-world coding scenario. In my view we have a real problem with real world measurement for AI model quality, and I want to start to show a better way by directly comparing models more on specific real-world tasks. So sit back and enjoy, and I hope this brings back fond memories of playing Monopoly!

You might wonder what I’m using in this article to assess code quality. It’s a very simple prompt! I’m looking to give AI coding models an open-ended prompt around a domain that’s familiar to them that gives them scope to put their creativity to work to really build something cool. The prompt is this:

Keep reading with a 7-day free trial

Subscribe to Nate’s Substack to keep reading this post and get 7 days of free access to the full post archives.