Architecture November 2025

Three-Tier Memory: How Human-Like Consolidation Changes Everything

Working memory, episodic memory, semantic memory — how a biologically-inspired architecture creates AI that actually remembers.

Human memory isn't one system — it's at least three, working together. Cognitive scientists have mapped these systems extensively: working memory for immediate processing, episodic memory for personal experiences, semantic memory for general knowledge.

Most AI memory implementations ignore this entirely. They treat memory as a single vector database, maybe with some RAG on top. It works, sort of. But it misses the architectural insights that make biological memory so effective.

Here's how we built a three-tier system inspired by human cognition.

Tier 1: Working Memory

Working memory is your mental scratchpad. It holds information you're actively using right now — the sentence you're reading, the numbers you're adding, the context of the current conversation.

Working Memory Properties
Capacity: Limited (4-7 items / context window)
Duration: Seconds to minutes
Access: Immediate, always available
Update: Continuous, automatic

In AI terms, this maps to the context window — but with an important addition. We maintain a structured working memory that tracks:

This isn't just raw tokens. It's organized information that the system can reason about.

Tier 2: Episodic Memory

Episodic memory stores specific experiences — what happened, when, where. Your memory of yesterday's lunch. That conversation you had last week. The time you debugged that weird race condition.

Episodic Memory Properties
Capacity: Large but bounded
Duration: Days to years (with decay)
Access: Retrieval-based, associative
Update: Append-mostly, consolidation during sleep

We implement episodic memory as a time-indexed store with rich metadata:

// Episodic memory entry structure
{
id: "ep_20251129_143022",
timestamp: "2025-11-29T14:30:22Z",
content: "User asked about Python async patterns...",
embedding: [0.123, -0.456, ...],
emotional_weight: 0.3,
topics: ["python", "async", "concurrency"],
linked_semantic: ["sem_python_expertise"],
access_count: 2,
last_accessed: "2025-11-29T16:45:00Z",
decay_factor: 0.95
}

Retrieval uses hybrid search: semantic similarity (embeddings), temporal proximity, and explicit topic matching. The `emotional_weight` field influences consolidation priority — more significant interactions are more likely to be preserved.

Tier 3: Semantic Memory

Semantic memory holds general knowledge — facts, concepts, relationships. Not "I ate pizza yesterday" but "pizza is food, food is eaten, eating satisfies hunger." In user terms: not "user asked about Python on November 29th" but "user is experienced with Python, prefers practical examples."

Semantic Memory Properties
Capacity: Very large, grows with experience
Duration: Long-term, highly stable
Access: Automatic activation, priming
Update: Gradual, through consolidation

We implement semantic memory as a knowledge graph:

// Semantic memory node
{
id: "sem_python_expertise",
type: "user_trait",
content: "User has strong Python expertise",
confidence: 0.87,
evidence_count: 14,
first_observed: "2025-10-15",
last_reinforced: "2025-11-29",
connections: [
{ to: "sem_prefers_concise", weight: 0.6 },
{ to: "sem_async_interest", weight: 0.8 },
{ to: "sem_backend_focus", weight: 0.7 }
]
}

Semantic nodes have confidence scores that increase with evidence and decay slowly without reinforcement. Connections between nodes enable spreading activation — when "Python" is activated, related concepts like "async" and "backend" get primed automatically.

The Flow Between Tiers

The magic isn't in any single tier — it's in how they interact:

Working → Episodic (End of Session)
Current conversation is chunked, embedded, and stored as episodic memories. Metadata extracted automatically.
Episodic → Semantic (During Sleep)
Patterns extracted from episodic memories. Repeated themes become semantic knowledge. Details fade, abstractions strengthen.
Semantic → Working (Start of Session)
Relevant semantic knowledge pre-loaded into context. User preferences, known expertise, established patterns.
Episodic → Working (During Session)
Specific past interactions retrieved when relevant. "Remember when we discussed X" becomes possible.

Retrieval Strategy

When a new message arrives, the retrieval pipeline:

  1. Activate semantic memory — User traits and preferences loaded into context
  2. Query episodic memory — Relevant past interactions retrieved via hybrid search
  3. Rank and filter — Results scored by relevance, recency, and importance
  4. Populate working memory — Best results added to context, respecting token limits

The key insight: semantic memory provides the lens through which episodic memories are interpreted. If we know the user is a Python expert, we retrieve and present technical details differently than for a beginner.

Consolidation

The Episodic → Semantic transition is handled by NeuralSleep. The process:

  1. Cluster similar episodes — Group related interactions
  2. Extract patterns — What's common across the cluster?
  3. Update or create semantic nodes — Strengthen existing knowledge or add new
  4. Decay episodic details — Reduce specificity of consolidated episodes

After consolidation, the system might not remember the exact conversation where you asked about Python debugging — but it knows you're experienced with Python and tend to encounter debugging scenarios. The specific episode has become general knowledge.

Implementation

The three-tier system is implemented in MemoryCore:

The architecture is modular — you can swap storage backends, adjust decay rates, tune retrieval parameters. But the three-tier structure and inter-tier flows are the core insight.

Results

Compared to single-tier vector store approaches:

The qualitative difference is more important: the system develops a genuine understanding of users over time, not just a log of interactions.

See this architecture in action: Building Luna walks through a complete implementation from scratch.