0:00
/
0:00
0:00
/
0:00
Preview

OpenAI ChatGPT Agent Mode Review: 5 Real-World Workflow Tests

Portfolio crunching, global attribution, ZIP-code comps, 0-100 growth hacks, and int'l incorporation: five tests reveal where Agent Mode works — and where you still babysit

Building groundbreaking technology is brutally hard. It takes courage, talent, and frankly, a willingness to publicly risk embarrassment. OpenAI has courageously stepped onto this risky stage once again with Agent Mode.

But in doing so, they’ve invited scrutiny—and that’s exactly why this review matters.

It’s 2025, and we’re sitting at the edge of what could become a watershed moment for artificial intelligence: the quest for a general-purpose AI agent. This isn’t merely a technical curiosity. It’s a vision that could reshape entire industries, transform daily workflows, and redefine productivity. But—and here’s where the stakes become crystal clear—vision is not reality. Yet.

I took on Agent Mode with high hopes. One of the core theses of 2025 is that adding LLM intelligence to tool use should result in really useful work being done. I think I’m on track to give that thesis about a C for the year. Coming true in spots (Claude Code is a bright spot), but not by any means universally true. I think we’re uncovering a lot more underlying complexity in workflows that current models aren’t yet capable of tackling independently.

This review isn’t really about critiquing features—it’s about asking OpenAI’s Agent to do some of that real work specifically in areas that the team itself highlighted as bright spots (research, Excel, PowerPoint, information synthesis, and multi-step tasks). As usual, I want to go through real tests, document carefully what I did, and then share the results verbatim with you.

Agent Mode represents a dream that is one of the biggest in the Valley: Imagine having an AI assistant that can flawlessly build spreadsheets, insightful presentations, and seamlessly navigate complex digital tasks. Imagine what that would mean for your workflow, your productivity, your competitive edge. It may or may not be your dream, but it is more or less the dream OpenAI’s team is building toward. And if this promise isn’t fully realized yet, it’s worth measuring so we can know precisely how far we are from the goal.

In the review you’re about to dive into, I took Agent Mode through five rigorous, real-world tests—no softballs here, only tasks that businesses actually wrestle with for real.

  • Financial portfolio analysis across a portfolio of three dozen equities

  • Marketing attribution modeling across multiple countries and hundreds of channels

  • Strategic real estate acquisition scenarios across US zip codes

  • Lean customer acquisition strategies for founders and vibe coders

  • Intricate cross-border incorporation planning for CEOs

The results were illuminating, to put it mildly.

Yes, Agent Mode has arms and legs—it can reach into Google Drive, manipulate spreadsheets, browse the web, and create presentations. This alone is remarkable. But the bigger question—can it use these tools intelligently, reliably, and autonomously to enhance human productivity?—is where the tests reveal critical insights. The stakes couldn’t be clearer. This isn’t about niche features or incremental improvements; it’s about whether Agent Mode truly brings us closer to a general-purpose intelligence that makes us better, faster, smarter.

Why does this matter to you? Because whether you’re a developer, a business leader, a product manager, or simply an AI enthusiast, the difference between promise and performance directly affects your decisions, investments, and future expectations. You deserve an unvarnished look at how Agent Mode actually performs under real world conditions.

Moreover, this isn’t just an evaluation—it’s an honest conversation about what’s achievable, what’s overhyped, and how to navigate expectations wisely. Agent Mode’s current state holds lessons not just for OpenAI, but for everyone building or relying on AI in critical workflows.

As you read the review, you’ll witness firsthand the ambition, limitations, and potential pitfalls of an agent navigating real-world complexities. We’re tracing an important step in one of the biggest projects humanity has ever attempted, after all. It’s worth diving deep!

And you know me, each test is deep on tangible detail, and I include actual assets produced by Agent Mode, so if you don’t have it yet you can see how it actually works.

With that—let’s dive into the five tests and what you can expect from OpenAI’s premiere Agent offering of the year…

Subscribers get all these newsletters!

Listen to this episode with a 7-day free trial

Subscribe to Nate’s Substack to listen to this post and get 7 days of free access to the full post archives.