The Amnesiac Intern
Why AI Agents Need External Memory
In the television series *Severance*, employees undergo a procedure that surgically divides their memories. When they are at work (the “Innie”), they have no memory of their outside life. When they leave (the “Outie”), they have no memory of what they did at work.
It makes for a chilling workplace drama. Unfortunately, it’s also the default reality for our AI Agents.
As we move from simple chatbots to long-running agents, we’re hitting a wall. It isn’t capability—our models are smarter than ever—it’s memory. We want reliable digital employees, but we’re designing them like interns who restart every single day with zero memory of what they did yesterday.
The Problem with Context Windows
Right now, an AI “remembers” using its context window—a short-term working memory where we dump every file, instruction, and chat log needed for a task.
While context windows are getting larger (Gemini’s 2M+, Claude’s 200k), they aren’t infinite, and they aren’t permanent.
Imagine onboarding a new engineer by sitting them down and reading them the entire Slack history of the company, every Jira ticket ever filed, and the entire codebase in one sitting. Then, you ask them to fix a bug. They might succeed.
But the next day, you wipe their memory and do it all over again.
This is inefficient, expensive (in terms of tokens), and prone to errors. When the context window fills up, the agent starts to “forget” earlier instructions, or we have to use summarization techniques that result in a loss of fidelity.
Anthropic’s Harness: A Better Way
I recently reviewed Anthropic’s engineering paper on “Effective Harnesses for Long-Running Agents” and their findings resonate deeply with me.
They discovered that simply making the model smarter wasn’t the answer. The answer was changing how the model works between sessions. They propose a two-agent architecture that acts less like a lone genius and more like a well-structured team.
The Two-Agent Model
1. The Initializer Agent (The Architect):
This agent’s job isn’t to write code; its job is to set up the room. It analyzes user requests and creates “external memory” artifacts: a feature list, a progress log, and a test plan. It draws the map before the journey begins.
2. The Coding Agent (The Builder):
This agent comes in to do the work. Unlike our “amnesiac intern,” it doesn’t need to read the project’s entire history. It just reads the map left by the Architect (and previous Builders).
Before this agent finishes its session, it has one critical job: update the artifacts. It checks off the item in the feature list or logs its progress.
The “Feature List” as a Contract
The core innovation here isn’t the AI — it’s the State Management.
By maintaining a `feature_list.json` or a `progress_log.txt` outside of the AI’s context window, we create a persistent state that survives the “death” of the individual agent instance. It is the baton passed from one runner to the next.
This is part of what I have been exploring with Specification Driven Development (SDD). The “spec” — whether it’s a Gherkin test, a UML diagram, or a detailed requirement doc — is the external brain. It anchors the agent. It tells the agent:
“Here is what works.”
“Here is what is broken.”
“Here is what success looks like.”
Beyond Coding: Security and Operations
While Anthropic focused on coding agents, the implications for Security and Operations are massive.
Security Audits
Imagine a “Red Team” agent tasked with finding vulnerabilities in a new release.
Without External Memory: Agent A spends 4 hours scanning and finds a SQL injection. The session ends. Agent B spins up, sees a generic “scan complete” summary, and potentially wastes time re-scanning the same endpoint or missing the nuance of the exploit.
With External Memory: Agent A logs the specific reproduction steps and the raw payload into a `vulnerability_context.json`. Agent B spins up, reads that file, and immediately starts verifying the patch.
Operations (SRE)
During a complex outage, you might have agents shifting “shifts.”
Shift 1 Agent: Identifies high latency in the database.
Shift 2 Agent: Needs to know *exactly* which queries were analyzed and ruled out.
If the handover is just a text summary, details get lost. If the handover is a structured state file (e.g., `incident_state.json`), the new agent picks up the investigation instantly.
Conclusion
We need to stop waiting for a “Super-Context” model that remembers everything forever. That’s a brute-force solution to a structural problem.
The future of helpful AI agents lies in the harness, the management layer that surrounds the model. If we want our digital interns to grow into senior engineers, we need to give them a way to take notes that survive the night.
We need to build them an external brain.
***
How are you handling state for your agents? Drop a note below or pop over to my LinkedIn to discuss.
Thanks, and see you in the comments!



