OpenAI o3-mini and o3-mini-high: a complete guide and practical benchmark
I grabbed these models and immediately put them to the test with what they are supposed to be best at: technical coding and planning. Then I scored them vs. other popular models to check performance!
I wrote this in two parts. Part 1 is all about what o3-mini is and how it scores and works. Part 2 is all about how it actually held up when I tested it, with lots of examples! I wanted to give a really full picture of the model, and so that meant level-setting before getting into the dirty details. If you want the big shocker reveal, head straight to Part 2 because I did NOT expect the test results I got.
Part 1: The Launch of o3-mini—and Why It Matters
Today, OpenAI announced two new reasoning models in their “mini” series: o3-mini and o3-mini-high, built to deliver robust STEM performance, faster response times, and a lower price tag than previous small models—particularly the outgoing o1-mini. The main goal? Push the boundaries of cost-effective reasoning while making advanced AI more accessible to developers and end users.
I’ve been been playing with these two new models in the ChatGPT environment, comparing them to existing offerings like OpenAI o1, o1 Pro, and Claude 3.5 Sonnet. Below, you’ll see why these newcomers matter, how they stack up, and what the data says about their speed, accuracy, and safety training.
Keep reading with a 7-day free trial
Subscribe to Nate’s Substack to keep reading this post and get 7 days of free access to the full post archives.