A single AI agent triggered a quarter-trillion-dollar selloff in enterprise software stocks. It’s a research preview that stops working when your laptop goes to sleep.
The tools behind that bet are still figuring out how to stay awake while you grab coffee. But that hasn’t stopped the pitch from spreading: Lindy, Sauna, Google Opal, Obvious, and a wave of startups are all selling the same thing. Software that does the work instead of helping you do the work. Outcomes, not answers.
The pitch might even be right — eventually. But there’s a question underneath it that almost nobody in the demo videos is asking, and it’s the question that determines whether any of this actually works: how does the agent know its own output is any good?
Code has a test suite. A strategy memo doesn’t compile. When an outcome agent drafts a report from your scattered notes, nothing tells it whether it captured the right insights or left out the thing that mattered most — and that distinction, between environments that give automated feedback and environments where you are the only feedback mechanism, is what separates the agents that work from the ones that quietly waste your time. That single fact is what this entire review is built on.
I tested four of the most prominent outcome agents against a framework built on this insight. Here’s what I found.
Here’s what’s inside:
Why code worked first. The structural reason AI agents nailed software before anything else, and what it tells us about knowledge work.
The three questions that separate real from fake. A framework for evaluating any outcome agent, starting with Cowork itself.
Four tools reviewed. Lindy, Sauna, Google Opal, and Obvious, each tested against the framework.
The principles that outlast the tools. Memory architecture, inspectable surfaces, and compounding context, whether you build or buy.
The evaluation prompt. A two-phase prompt that scores any agent tool against the framework, then builds a delegation spec calibrated to its actual weaknesses — so you write the tests before the agent runs the work.
Let’s start with why this category exists at all.
Listen to this episode with a 7-day free trial
Subscribe to Nate’s Substack to listen to this post and get 7 days of free access to the full post archives.













