Codex Plugins: Why the AI Bottleneck Moved to Workflow

Playback speed

Share post at current time

Share from 0:00

0:00

Playback speed

0:00

Preview

OpenAI made Codex smart enough that the bottleneck moved. Most people haven't noticed where it went.

Codex plugins matter because the bottleneck moved.

Nate

May 09, 2026

∙ Paid

Open a new Codex thread and you are the operating system.

You explain the repo. You paste the standard. You point at the docs. You list the tools. You list the failure modes you do not want the agent to repeat. By the time it is ready to act, you have done a real chunk of the work yourself, just to get the work started.

That part did not get better with GPT-5.5. The model did. Codex is now at 82.7% on Terminal-Bench 2.0, up from 75.1%, and the lift you actually feel is bigger than the number reads, because the model can now stay inside long, multi-tool tasks without losing the thread. It reviews pull requests against your standards, builds screens from Figma comps, runs tests in a browser, pulls context across Slack, Drive, GitHub, and Linear, and drafts release notes from the diff. The model is good now.

The work around it is not.

That is the bottleneck. The workflow lives in your head, and you reload it every thread. From here, the work has to meet the model halfway.

That is what plugins are for. A skill says how the work should be done. A plugin packages that skill with tool access, live integrations, deterministic checks, the team’s failure modes, and the parts of the standard nobody has written down. Once installed, the agent stops needing you to be the OS.

The stakes are not subtle. A stronger model with a vague environment does not give you more help. It gives you faster, more confident wrongness. Reviews that miss the team’s review standard. Release notes that drift into engineering language. Customer summaries that mix admin material into the team-facing recap. Each looks fine alone. Together they make a company that runs faster and means less.

The career version is sharper. The next competitive skill is not writing the longest prompt. It is knowing which parts of your work should become reusable infrastructure. Two years from now, the people who learned to package will be compounding. Everyone else will be explaining the workflow on Tuesday morning.

Here’s what’s inside:

The bottleneck that GPT-5.5 made visible. Why a stronger model with a vague environment gives you faster wrongness, not more help.
The decision ladder. When to stay with a prompt, when to build a skill, when to package a plugin, and when not to bother.
Which workflows to package first. Five categories worth the investment, and a test for whether yours qualifies.
Grab The Ultimate Codex Plugin Guide + prompts. The full step-by-step build guide from skill file through plugin manifest and debugging checklist, plus seven prompts that take you from workflow audit to installed, tested plugin.

Let me show you how the bottleneck moved, and what to do about it.

Listen to this episode with a 7-day free trial

Subscribe to Nate’s Substack to listen to this post and get 7 days of free access to the full post archives.