Three-Tier Memory: How Human-Like Consolidation Changes Everything

Human memory isn't one system — it's at least three, working together. Cognitive scientists have mapped these systems extensively: working memory for immediate processing, episodic memory for personal experiences, semantic memory for general knowledge.

Most AI memory implementations ignore this entirely. They treat memory as a single vector database, maybe with some RAG on top. It works, sort of. But it misses the architectural insights that make biological memory so effective.

Here's how we built a three-tier system inspired by human cognition.

Tier 1: Working Memory

Working memory is your mental scratchpad. It holds information you're actively using right now — the sentence you're reading, the numbers you're adding, the context of the current conversation.

Working Memory Properties

Capacity: Limited (4-7 items / context window)

Duration: Seconds to minutes

Access: Immediate, always available

Update: Continuous, automatic

In AI terms, this maps to the context window — but with an important addition. We maintain a structured working memory that tracks:

Current conversation state
Active user intent
Relevant retrieved context (from other tiers)
Pending actions or follow-ups

This isn't just raw tokens. It's organized information that the system can reason about.

Tier 2: Episodic Memory

Episodic memory stores specific experiences — what happened, when, where. Your memory of yesterday's lunch. That conversation you had last week. The time you debugged that weird race condition.

Episodic Memory Properties

Capacity: Large but bounded

Duration: Days to years (with decay)

Access: Retrieval-based, associative

Update: Append-mostly, consolidation during sleep

We implement episodic memory as a time-indexed store with rich metadata:

// Episodic memory entry structure

{

id: "ep_20251129_143022",

timestamp: "2025-11-29T14:30:22Z",

content: "User asked about Python async patterns...",

embedding: [0.123, -0.456, ...],

emotional_weight: 0.3,

topics: ["python", "async", "concurrency"],

linked_semantic: ["sem_python_expertise"],

access_count: 2,

last_accessed: "2025-11-29T16:45:00Z",

decay_factor: 0.95

}

Retrieval uses hybrid search: semantic similarity (embeddings), temporal proximity, and explicit topic matching. The `emotional_weight` field influences consolidation priority — more significant interactions are more likely to be preserved.

Tier 3: Semantic Memory

Semantic memory holds general knowledge — facts, concepts, relationships. Not "I ate pizza yesterday" but "pizza is food, food is eaten, eating satisfies hunger." In user terms: not "user asked about Python on November 29th" but "user is experienced with Python, prefers practical examples."

Semantic Memory Properties

Capacity: Very large, grows with experience

Duration: Long-term, highly stable

Access: Automatic activation, priming

Update: Gradual, through consolidation

We implement semantic memory as a knowledge graph:

// Semantic memory node

{

id: "sem_python_expertise",

type: "user_trait",

content: "User has strong Python expertise",

confidence: 0.87,

evidence_count: 14,

first_observed: "2025-10-15",

last_reinforced: "2025-11-29",

connections: [

{ to: "sem_prefers_concise", weight: 0.6 },

{ to: "sem_async_interest", weight: 0.8 },

{ to: "sem_backend_focus", weight: 0.7 }

]

}

Semantic nodes have confidence scores that increase with evidence and decay slowly without reinforcement. Connections between nodes enable spreading activation — when "Python" is activated, related concepts like "async" and "backend" get primed automatically.

The Flow Between Tiers

The magic isn't in any single tier — it's in how they interact:

Working → Episodic (End of Session)

Current conversation is chunked, embedded, and stored as episodic memories. Metadata extracted automatically.

Episodic → Semantic (During Sleep)

Patterns extracted from episodic memories. Repeated themes become semantic knowledge. Details fade, abstractions strengthen.

Semantic → Working (Start of Session)

Relevant semantic knowledge pre-loaded into context. User preferences, known expertise, established patterns.

Episodic → Working (During Session)

Specific past interactions retrieved when relevant. "Remember when we discussed X" becomes possible.

Retrieval Strategy

When a new message arrives, the retrieval pipeline:

Activate semantic memory — User traits and preferences loaded into context
Query episodic memory — Relevant past interactions retrieved via hybrid search
Rank and filter — Results scored by relevance, recency, and importance
Populate working memory — Best results added to context, respecting token limits

The key insight: semantic memory provides the lens through which episodic memories are interpreted. If we know the user is a Python expert, we retrieve and present technical details differently than for a beginner.

Consolidation

The Episodic → Semantic transition is handled by NeuralSleep. The process:

Cluster similar episodes — Group related interactions
Extract patterns — What's common across the cluster?
Update or create semantic nodes — Strengthen existing knowledge or add new
Decay episodic details — Reduce specificity of consolidated episodes

After consolidation, the system might not remember the exact conversation where you asked about Python debugging — but it knows you're experienced with Python and tend to encounter debugging scenarios. The specific episode has become general knowledge.

Implementation

The three-tier system is implemented in MemoryCore:

Working Memory: In-memory store, refreshed per session
Episodic Memory: PostgreSQL + pgvector for hybrid search
Semantic Memory: Neo4j knowledge graph (or PostgreSQL with JSON)
Consolidation: NeuralSleep scheduled jobs

The architecture is modular — you can swap storage backends, adjust decay rates, tune retrieval parameters. But the three-tier structure and inter-tier flows are the core insight.

Results

Compared to single-tier vector store approaches:

Storage efficiency: 60% reduction in stored data (through consolidation)
Retrieval precision: 35% improvement (through semantic priming)
Long-term coherence: Dramatically better — conversations from months ago still influence behavior
Graceful degradation: Old memories fade naturally rather than being hard-cutoff

The qualitative difference is more important: the system develops a genuine understanding of users over time, not just a log of interactions.

See this architecture in action: Building Luna walks through a complete implementation from scratch.