Memory Systems: Persistent Context & RAG

An agent without memory is like a coworker with amnesia. Every morning they show up, they do not remember yesterday's decisions, last week's project updates, or the preferences you have told them a dozen times. The user file and soul file give the agent a fixed identity, but memory systems give it the ability to learn and retain information over time.

The Context Window Problem

Every LLM has a context window — the maximum amount of text it can process in a single interaction. This includes the system prompt, the conversation history, any loaded files, and the model's own response. Once the context window fills up, the model cannot see anything beyond it.

This means that in a long conversation, the model may lose track of what was said at the beginning. In a new session, it has no access to previous sessions at all. The context window is the model's entire "working memory," and it resets with every new conversation.

Memory systems extend the agent beyond this limit by saving important information to persistent storage and loading relevant pieces back into context when needed.

The Memory File

The simplest form of agent memory is a memory.md file — a structured document where the agent records important facts, preferences, and logs.

# memory.md

## Permanent Facts
- User prefers dark mode in all applications
- Production database is on port 5432, staging on 5433
- Weekly standup is every Tuesday at 10:00 AM Pacific
- The design team uses Figma, engineering uses Linear

## Project Notes
- Atlas migration: Auth service completed, Payment service next
- Deployment pipeline: Jenkins is being replaced with GitHub Actions
- Technical debt: The notification service needs a full rewrite in Q3

## Daily Log
- 2026-03-15: Reviewed PR #342, found 3 issues, user approved after fixes
- 2026-03-14: Set up staging environment for Atlas Phase 2
- 2026-03-13: User decided to postpone the database migration to next sprint

The agent reads this file at the start of each session and updates it when it learns something new. This approach is straightforward and works well for agents with moderate memory needs.

Beyond Simple Files: Vector Memory Search

As the amount of stored information grows, a flat file becomes impractical. Searching through hundreds of entries for the relevant one takes time and wastes context window space by loading unnecessary information.

Vector memory search solves this by converting text into numerical representations (called embeddings) and finding semantically similar entries. Instead of keyword matching, it understands meaning.

Method	Search Query: "database performance"	Result
Keyword matching	Finds entries containing "database" or "performance" literally	May miss "PostgreSQL query optimization"
Vector search	Finds entries with similar meaning	Matches "PostgreSQL query optimization," "slow queries on staging," "indexing strategy discussion"

Vector search finds relevant memories even when the exact words differ. This is critical because you rarely search for information using the same words you used to store it.

RAG: Retrieval-Augmented Generation

RAG (Retrieval-Augmented Generation) is the pattern of pulling relevant documents into the model's context before it generates a response. Instead of relying only on what the model was trained on, RAG feeds it specific, up-to-date information at query time.

Here is how it works in an agent system:

You ask a question — "What was the decision on the payment service architecture?"
The agent searches memory — Using vector search or keyword matching, it finds relevant entries from past conversations, meeting notes, or project documents.
Relevant context is loaded — The retrieved documents are inserted into the model's context alongside your question.
The model responds — With the retrieved context available, the model gives an informed answer grounded in your actual project history.

Without RAG, the model would either say "I do not have that information" or, worse, guess. With RAG, it has access to the specific documents it needs.

Practical Memory Strategies

Organizing agent memory effectively requires a few practical strategies:

Separate short-term from long-term memory. Short-term memory is the current conversation context. Long-term memory is facts, preferences, and decisions that persist across sessions. Not everything belongs in long-term memory — transient details like "the user wants bullet points in this response" do not need to be saved permanently.

Use structured categories. Group memories by type: permanent facts, project notes, personal preferences, decision logs. This makes retrieval more efficient and prevents the agent from loading irrelevant memories.

Prune regularly. Outdated information degrades memory quality. If a project is completed, archive its notes. If a preference has changed, update the record. Stale memories can cause the agent to make decisions based on information that is no longer true.

Let the agent manage its own memory. Rather than manually updating memory files, configure the agent to detect when something is worth remembering and save it automatically. A well-configured agent can identify new facts, changed preferences, and important decisions without explicit instruction.

# Example: simple memory management logic
def should_save_to_memory(message, response):
    memory_triggers = [
        "remember that",
        "from now on",
        "my preference is",
        "we decided to",
        "the new plan is",
    ]
    # Check if the conversation contains a memory-worthy statement
    for trigger in memory_triggers:
        if trigger in message.lower():
            return True
    return False

How Memory Fits the Bigger Picture

Memory is the bridge between the static configuration files (user, identity, soul) and the dynamic, evolving relationship between you and your agent. The configuration files define who you are and how the agent should behave. Memory captures what happens over time — decisions made, preferences discovered, projects completed, lessons learned.

Together, they create an agent that is not just configured but experienced — one that gets better at serving you the longer it operates.

Key takeaway: Memory systems extend the agent beyond the context window by saving and retrieving information across sessions. Simple memory files work for small needs. Vector search and RAG handle larger knowledge bases by finding semantically relevant information. The key is structuring memory so the agent loads only what it needs, when it needs it.

Next: Operational rules and the heartbeat system — turning your agent from reactive to proactive. :::