I started Luna because I was frustrated. Every AI assistant I used felt like it had amnesia. No matter how many conversations we had, every session started fresh. My preferences, our history, the context we'd built — gone.
The big labs were focused on bigger models, longer context windows, more capabilities. Nobody seemed interested in the basic question: what if AI could actually remember?
So I built it myself.
The Starting Point
Luna began as an experiment in late 2024. The hypothesis was simple: if human memory works through consolidation — encoding, storage, retrieval, and active processing during sleep — maybe AI memory should too.
The first version was embarrassingly simple:
It worked, barely. The system "remembered" things, but retrieval was random. Important interactions got lost in noise. Context windows filled up fast. And there was no consolidation — just accumulation.
The core insight came from neuroscience papers on the hippocampus-neocortex loop. Human memory isn't a database. It's a multi-stage system where information is transformed as it moves from short-term to long-term storage. Sleep plays a crucial role in this transformation.
What if we built that?
Architecture Evolution
Luna went through several architectural iterations:
v0.2: Vector Store
Added embeddings and vector search. Conversations stored as vectors, retrieved by semantic similarity. Better than text files, but still missed the point. Vector similarity isn't the same as relevance.
v0.3: Two-Tier Memory
Split memory into "recent" and "historical." Recent memories in context, historical retrieved as needed. Closer, but the boundary was arbitrary. When does "recent" become "historical"?
v0.4: Three-Tier with Basic Decay
Introduced working memory, episodic memory, and semantic memory as distinct systems. Added time-based decay. Now we had structure, but no active consolidation.
v1.0: Full Consolidation
Added NeuralSleep — offline processing that moves information between tiers, extracts patterns, and prunes noise. This was the breakthrough.
The Technical Stack
Current Luna architecture:
Key Implementation Details
Memory Encoding
When a conversation ends, it's processed into episodic memories:
The metadata extraction is crucial. We use the LLM itself to identify topics, emotional weight, and relevance markers. This takes time but pays off in retrieval quality.
Retrieval Pipeline
On new message, the retrieval pipeline:
- Load semantic context — User traits, preferences, known expertise
- Hybrid episodic search — Vector similarity + recency + topic matching
- Rerank results — Score by contextual relevance
- Build context — Assemble working memory within token budget
The reranking step matters. Raw vector similarity often returns semantically similar but contextually irrelevant results. We use a small reranker model to score results against the current query.
Sleep Cycles
NeuralSleep runs every 6 hours (configurable). The process:
Lessons Learned
Building Luna taught me several things:
1. Simple beats clever
My early architectures were over-engineered. The final system is simpler than v0.3 in some ways. PostgreSQL with pgvector handles 90% of use cases. You don't need a graph database until you really need a graph database.
2. Consolidation is the key insight
The difference between "AI with memory" and "AI that remembers" is consolidation. Without active processing, you just have a log. With consolidation, you have knowledge.
3. Decay is a feature
Early versions tried to remember everything. The system drowned in noise. Implementing proper decay — with different rates for different memory types — made retrieval dramatically better.
4. Metadata matters more than embeddings
Vector search is necessary but not sufficient. Rich metadata (topics, emotional weight, context markers) enables sophisticated retrieval that pure semantic similarity can't achieve.
5. Solo dev is possible
You don't need a team to build meaningful AI systems. The constraints of solo development force simplicity and focus. Luna is built by one person, running on one server, serving real users.
Current State
Luna is live and in beta. Key metrics:
- Memory persistence: Conversations from months ago still influence responses
- Consolidation efficiency: ~40% reduction in stored data while maintaining recall
- User experience: Conversations feel continuous, not episodic
The system isn't perfect. Consolidation sometimes over-generalizes. Retrieval occasionally misses relevant context. Edge cases abound. But it works — and it proves the architecture is viable.
What's Next
Current development focus:
- Multi-user memory: Shared knowledge vs. personal memories
- Richer knowledge graphs: Better relationship modeling in semantic memory
- Adaptive consolidation: Learning optimal decay rates per user
- Open source components: MemoryCore and NeuralSleep are available now
Try It Yourself
If you want to build something similar:
- Start with the three-tier architecture
- Implement basic episodic storage with embeddings
- Add semantic memory as a simple key-value store initially
- Build consolidation gradually — start with decay, add pattern extraction later
- Test with real conversations, iterate based on retrieval quality
The code is open source. The architecture is documented. The hard part is already done.
AI that remembers isn't science fiction. It's a few hundred lines of code and the willingness to think differently about memory.