The Day the Rocks Began to Think: Reflections on OpenAI’s o1

So much for AI winter. It's been 11 hours and I can already tell this is one of those days we're going to mark in the history books.

Dec 06, 2024

∙ Paid

December 5, 2024, may go down as the day the rocks began to think. OpenAI launched o1, ushering in a new era for large language models. Yes, it’s a big deal. Eleven hours in, and o1 already feels indispensable to me. I can imagine dozens of uses for it that would not be well-suited to 4o—like coding simplification for starters.

This day has been a long time coming. OpenAI has been hinting guardedly for months that o1 was on the horizon, but no one knew when exactly we’d get the next big leap in intelligence. Well, it’s here.

And as OpenAI is teasing 12 Days of OpenAI there’s a lot more to come! But for today, o1 is plenty to digest. I’ve spent time today chatting with it and already the difference is clear: If its predecessor, GPT-4o (“4o”), felt like an overly wordy intern, then o1 feels a truly senior partner. Someone who knows how to be thoughtful, concise, who can walk you through a complex problem with calm mastery.

It’s been a few hours, and I’m still just scratching the surface with o1. While I’m still exploring, I have strong early opinions: o1 is faster at zeroing in on the heart of a problem and better at explaining each step. It’s like having a 10 or 15 year veteran sitting beside you, thinking through the whole problem and answering correctly the first time. It’s a world apart from what we understood about LLMs just yesterday.

From Intern to Partner

Thousands of hours of YouTube and millions have words have been created to debate what intelligence means in AI. For two years now, language models have mimicked knowledge and reasoning. They can write essays, answer questions, code, and summarize complex texts. But they always felt slightly junior—eager, bright, but prone to confusion when pushed too far. Complexity and nuance tripped them up. To keep them on track, you had to nudge, rephrase, and sometimes simplify your questions, much like guiding (or leaning on) a junior analyst still learning the ropes.

We have to have that whole debate over again now with o1. With full chain-of-thought reasoning, o1 approaches problems the way a thoughtful human might: it breaks down puzzles into manageable steps. It doesn’t just give an answer; it shows a careful internal process that leads to a more reliable, well-reasoned conclusion. And it does it faster than o1 Preview! It’s as if we’ve finally given AI the ability to think to itself before answering. This capability underpins more accurate responses in coding, data science, and even specialized domains like math or case law analysis.

In fact, o1’s reasoning skills are so developed that OpenAI tested it on benchmarks that the vast majority of humans couldn’t solve—competition-level math and code problems once considered out of reach. On the AIME (a notoriously difficult American math competition), o1 preview models soared to performance levels rivaling top students. On Codeforces problems (a platform for competitive programming), o1’s skill beats older models by a wide margin. The graph below tells the story.

Keep reading with a 7-day free trial

Subscribe to Nate’s Substack to keep reading this post and get 7 days of free access to the full post archives.