Featured Build Log November 2025

Building Luna: A Solo Developer's Journey Into Consciousness-Oriented AI

How one developer built an AI assistant with persistent memory, sleep cycles, and human-like consolidation. The technical story behind Luna.

I started Luna because I was frustrated. Every AI assistant I used felt like it had amnesia. No matter how many conversations we had, every session started fresh. My preferences, our history, the context we'd built — gone.

The big labs were focused on bigger models, longer context windows, more capabilities. Nobody seemed interested in the basic question: what if AI could actually remember?

So I built it myself.

The Starting Point

Luna began as an experiment in late 2024. The hypothesis was simple: if human memory works through consolidation — encoding, storage, retrieval, and active processing during sleep — maybe AI memory should too.

The first version was embarrassingly simple:

// Luna v0.1 - The naive approach
1. Save every conversation to a text file
2. On new conversation, load recent files into context
3. Hope the LLM figures it out

It worked, barely. The system "remembered" things, but retrieval was random. Important interactions got lost in noise. Context windows filled up fast. And there was no consolidation — just accumulation.

The core insight came from neuroscience papers on the hippocampus-neocortex loop. Human memory isn't a database. It's a multi-stage system where information is transformed as it moves from short-term to long-term storage. Sleep plays a crucial role in this transformation.

What if we built that?

Architecture Evolution

Luna went through several architectural iterations:

v0.2: Vector Store

Added embeddings and vector search. Conversations stored as vectors, retrieved by semantic similarity. Better than text files, but still missed the point. Vector similarity isn't the same as relevance.

v0.3: Two-Tier Memory

Split memory into "recent" and "historical." Recent memories in context, historical retrieved as needed. Closer, but the boundary was arbitrary. When does "recent" become "historical"?

v0.4: Three-Tier with Basic Decay

Introduced working memory, episodic memory, and semantic memory as distinct systems. Added time-based decay. Now we had structure, but no active consolidation.

v1.0: Full Consolidation

Added NeuralSleep — offline processing that moves information between tiers, extracts patterns, and prunes noise. This was the breakthrough.

The Technical Stack

Current Luna architecture:

LLM Backend
Groq (Llama 3.1 70B) for fast inference. Claude for complex reasoning tasks.
Memory Store
PostgreSQL + pgvector for episodic memory. Redis for working memory cache.
Knowledge Graph
PostgreSQL with JSONB (simpler than Neo4j for solo dev).
Consolidation
NeuralSleep running as scheduled jobs via node-cron.
Frontend
React + TypeScript. Clean chat interface with memory visualization.
Infrastructure
Single VPS, Docker Compose. Nothing fancy.

Key Implementation Details

Memory Encoding

When a conversation ends, it's processed into episodic memories:

async function encodeConversation(messages) {
// Chunk conversation into meaningful segments
const chunks = await chunkConversation(messages);
for (const chunk of chunks) {
// Generate embedding
const embedding = await embed(chunk.content);
// Extract metadata via LLM
const metadata = await extractMetadata(chunk);
// Store in episodic memory
await episodicStore.insert({
content: chunk.content,
embedding,
...metadata,
timestamp: new Date(),
decayFactor: 1.0
});
}
}

The metadata extraction is crucial. We use the LLM itself to identify topics, emotional weight, and relevance markers. This takes time but pays off in retrieval quality.

Retrieval Pipeline

On new message, the retrieval pipeline:

  1. Load semantic context — User traits, preferences, known expertise
  2. Hybrid episodic search — Vector similarity + recency + topic matching
  3. Rerank results — Score by contextual relevance
  4. Build context — Assemble working memory within token budget

The reranking step matters. Raw vector similarity often returns semantically similar but contextually irrelevant results. We use a small reranker model to score results against the current query.

Sleep Cycles

NeuralSleep runs every 6 hours (configurable). The process:

async function runSleepCycle() {
// Phase 1: Replay recent memories
const recent = await getRecentEpisodes(24 * 60 * 60);
// Phase 2: Cluster and extract patterns
const clusters = await clusterEpisodes(recent);
const patterns = await extractPatterns(clusters);
// Phase 3: Update semantic memory
for (const pattern of patterns) {
await updateSemanticMemory(pattern);
}
// Phase 4: Apply decay
await applyTemporalDecay();
// Phase 5: Prune low-value memories
await pruneWeakMemories(threshold: 0.1);
}

Lessons Learned

Building Luna taught me several things:

1. Simple beats clever

My early architectures were over-engineered. The final system is simpler than v0.3 in some ways. PostgreSQL with pgvector handles 90% of use cases. You don't need a graph database until you really need a graph database.

2. Consolidation is the key insight

The difference between "AI with memory" and "AI that remembers" is consolidation. Without active processing, you just have a log. With consolidation, you have knowledge.

3. Decay is a feature

Early versions tried to remember everything. The system drowned in noise. Implementing proper decay — with different rates for different memory types — made retrieval dramatically better.

4. Metadata matters more than embeddings

Vector search is necessary but not sufficient. Rich metadata (topics, emotional weight, context markers) enables sophisticated retrieval that pure semantic similarity can't achieve.

5. Solo dev is possible

You don't need a team to build meaningful AI systems. The constraints of solo development force simplicity and focus. Luna is built by one person, running on one server, serving real users.

Current State

Luna is live and in beta. Key metrics:

The system isn't perfect. Consolidation sometimes over-generalizes. Retrieval occasionally misses relevant context. Edge cases abound. But it works — and it proves the architecture is viable.

What's Next

Current development focus:

Try It Yourself

If you want to build something similar:

  1. Start with the three-tier architecture
  2. Implement basic episodic storage with embeddings
  3. Add semantic memory as a simple key-value store initially
  4. Build consolidation gradually — start with decay, add pattern extraction later
  5. Test with real conversations, iterate based on retrieval quality

The code is open source. The architecture is documented. The hard part is already done.

AI that remembers isn't science fiction. It's a few hundred lines of code and the willingness to think differently about memory.