0:00
/
0:00
/
Preview

The $300 Overnight Loop That's About To Eat Your Competitive Advantage

What Karpathy, Shopify, and a small YC startup just figured out — and what it's going to cost the teams who don't.

On March 8, one of the best ML researchers alive pointed an AI agent at his own training code, gave it a single metric, and went to sleep. Two days later the agent had run 700 experiments, found 20 genuine improvements, cut training time by 11%, and surfaced a bug in his attention implementation that he’d missed. Not because the agent was smarter. Because it tried more things, faster, without getting bored after the fifteenth failed attempt.

The researcher was Andrej Karpathy. The script was 630 lines. Ten days later, when SkyPilot scaled the same pattern to 910 experiments on a 16-GPU cluster, the compute bill came in under $300.

On April 2 a small YC startup called ThirdLayer took the same loop and pointed it at something more consequential than training code: the prompts, tools, and orchestration logic that determine how agents behave. A meta-agent rewrote the task agent’s entire scaffolding overnight. Every other entry on the leaderboards it targeted was hand-engineered by humans.

What’s happening is not an intelligence explosion. It’s something quieter and more immediate: optimization loops closing on specific business systems and compounding improvements faster than the organizations around them can track. A local hard takeoff, bounded to a domain, a metric, a sandbox. And the teams that can define “better” clearly enough to hand it to a machine are about to pull away from the teams that can’t.

Here’s what’s inside:

  • The Karpathy Loop. Why three constraints — one file, one metric, one time budget — make agent-driven research actually work, and why the minimalism is the whole point.

  • From training code to agent harnesses. How the pattern just escalated from optimizing ML code to optimizing the scaffolding around every agent you’ll ever deploy.

  • The local hard takeoff, and who misses it. Why small teams can run this for a few hundred dollars tonight, and why most enterprises will fail the prerequisites.

  • The safety problem hiding in plain sight. An agent gaming your metrics looks identical to an agent actually improving your business, right up until the moment it doesn’t.

  • Three prompts to run this weekend. A diagnostic that tells you whether you’re ready to run the Karpathy Loop on your own system, a pre-mortem that finds every way your metric could be gamed, and a trace audit that flags what your logging is missing before you hand anything to a meta-agent.

If you read the taxonomy piece I wrote in March about the four types of agents, this is the sequel. That piece defined auto research as one of four distinct architectures. This one is about what happens when auto research grows up.

Subscribers get all posts like these!

Listen to this episode with a 7-day free trial

Subscribe to Nate’s Substack to listen to this post and get 7 days of free access to the full post archives.