The enterprise agent market is drowning in vendor promises that violate basic engineering constraints. Gartner reports a 750% surge in AI-agent related inquiries in 2024, but the organizations actually deploying agents at scale aren’t asking “which model should we use”—they’re asking “why do our agents keep failing on tasks that take more than one session?”
The answer has been hiding in plain sight: agents don’t fail because models are too dumb. They fail because every session starts with no grounded sense of where the work stands. This is a memory problem, not an intelligence problem—and Anthropic’s recently published engineering documentation confirms what serious builders have known for years. The fantasy of drop-in universal agents is dead.
What’s emerging in its place is a clear, reusable pattern that works. The organizations who understand it will build agents that actually finish work. The ones who don’t will keep writing checks for sophisticated amnesiacs.
This briefing covers:
The generalized agent fantasy and why it’s dead: What actually happens when you wrap a frontier model in an agent framework—and why the major platforms all still require you to solve the memory problem yourself.
The research that explains the failure mode: Why million-token context windows make things worse, not better, and what the academic literature reveals about the fundamental constraint.
Domain memory as infrastructure: The specific components—goals, progress tracking, and operating procedures—that transform unreliable agents into systems that maintain coherent progress across sessions.
The two-agent pattern that works: Anthropic’s proposed architecture, why it’s elegant, and how it acknowledges rather than fights the amnesiac nature of AI.
Beyond code—domain memory for any workflow: How to adapt the pattern for research, operations, content production, and other structured workflows.
Vendor claim triage: Which enterprise agent promises you can immediately dismiss, and what questions to ask instead.
Design principles for serious agents: The five disciplines that separate agents that finish work from agents that thrash.
The strategic moat and team implications: Why your competitive advantage lies in memory design, not model selection—and how to organize your teams around this reality.
And of course, five prompts to help you make this real:
Domain Memory Designer — General-purpose framework for any workflow. Helps you define goal artifacts, progress tracking, and validation criteria from scratch.
Research Workflow Memory — For investigations, evidence gathering, and synthesis work. Covers hypothesis backlogs, evidence logs, and decision journals.
Operations Agent Memory — For recurring processes, incident response, and ticket queues. Covers runbooks, incident timelines, and SLA tracking.
Content Production Memory — For editorial workflows and publishing pipelines. Covers editorial calendars, draft registries, and source logs.
Agent Workflow Audit — Helps you diagnose your current agent deployments. Identifies which workflows are breaking, why, and where domain memory would fix them.
What follows is a technical argument with direct strategic implications for how you allocate AI infrastructure resources over the next 18 months.












