Human memory isn't one system — it's at least three, working together. Cognitive scientists have mapped these systems extensively: working memory for immediate processing, episodic memory for personal experiences, semantic memory for general knowledge.
Most AI memory implementations ignore this entirely. They treat memory as a single vector database, maybe with some RAG on top. It works, sort of. But it misses the architectural insights that make biological memory so effective.
Here's how we built a three-tier system inspired by human cognition.
Tier 1: Working Memory
Working memory is your mental scratchpad. It holds information you're actively using right now — the sentence you're reading, the numbers you're adding, the context of the current conversation.
In AI terms, this maps to the context window — but with an important addition. We maintain a structured working memory that tracks:
- Current conversation state
- Active user intent
- Relevant retrieved context (from other tiers)
- Pending actions or follow-ups
This isn't just raw tokens. It's organized information that the system can reason about.
Tier 2: Episodic Memory
Episodic memory stores specific experiences — what happened, when, where. Your memory of yesterday's lunch. That conversation you had last week. The time you debugged that weird race condition.
We implement episodic memory as a time-indexed store with rich metadata:
Retrieval uses hybrid search: semantic similarity (embeddings), temporal proximity, and explicit topic matching. The `emotional_weight` field influences consolidation priority — more significant interactions are more likely to be preserved.
Tier 3: Semantic Memory
Semantic memory holds general knowledge — facts, concepts, relationships. Not "I ate pizza yesterday" but "pizza is food, food is eaten, eating satisfies hunger." In user terms: not "user asked about Python on November 29th" but "user is experienced with Python, prefers practical examples."
We implement semantic memory as a knowledge graph:
Semantic nodes have confidence scores that increase with evidence and decay slowly without reinforcement. Connections between nodes enable spreading activation — when "Python" is activated, related concepts like "async" and "backend" get primed automatically.
The Flow Between Tiers
The magic isn't in any single tier — it's in how they interact:
Retrieval Strategy
When a new message arrives, the retrieval pipeline:
- Activate semantic memory — User traits and preferences loaded into context
- Query episodic memory — Relevant past interactions retrieved via hybrid search
- Rank and filter — Results scored by relevance, recency, and importance
- Populate working memory — Best results added to context, respecting token limits
The key insight: semantic memory provides the lens through which episodic memories are interpreted. If we know the user is a Python expert, we retrieve and present technical details differently than for a beginner.
Consolidation
The Episodic → Semantic transition is handled by NeuralSleep. The process:
- Cluster similar episodes — Group related interactions
- Extract patterns — What's common across the cluster?
- Update or create semantic nodes — Strengthen existing knowledge or add new
- Decay episodic details — Reduce specificity of consolidated episodes
After consolidation, the system might not remember the exact conversation where you asked about Python debugging — but it knows you're experienced with Python and tend to encounter debugging scenarios. The specific episode has become general knowledge.
Implementation
The three-tier system is implemented in MemoryCore:
- Working Memory: In-memory store, refreshed per session
- Episodic Memory: PostgreSQL + pgvector for hybrid search
- Semantic Memory: Neo4j knowledge graph (or PostgreSQL with JSON)
- Consolidation: NeuralSleep scheduled jobs
The architecture is modular — you can swap storage backends, adjust decay rates, tune retrieval parameters. But the three-tier structure and inter-tier flows are the core insight.
Results
Compared to single-tier vector store approaches:
- Storage efficiency: 60% reduction in stored data (through consolidation)
- Retrieval precision: 35% improvement (through semantic priming)
- Long-term coherence: Dramatically better — conversations from months ago still influence behavior
- Graceful degradation: Old memories fade naturally rather than being hard-cutoff
The qualitative difference is more important: the system develops a genuine understanding of users over time, not just a log of interactions.
See this architecture in action: Building Luna walks through a complete implementation from scratch.