Building Luna: A Solo Developer's Journey Into Consciousness-Oriented AI

I started Luna because I was frustrated. Every AI assistant I used felt like it had amnesia. No matter how many conversations we had, every session started fresh. My preferences, our history, the context we'd built — gone.

The big labs were focused on bigger models, longer context windows, more capabilities. Nobody seemed interested in the basic question: what if AI could actually remember?

So I built it myself.

The Starting Point

Luna began as an experiment in late 2024. The hypothesis was simple: if human memory works through consolidation — encoding, storage, retrieval, and active processing during sleep — maybe AI memory should too.

The first version was embarrassingly simple:

// Luna v0.1 - The naive approach

1. Save every conversation to a text file

2. On new conversation, load recent files into context

3. Hope the LLM figures it out

It worked, barely. The system "remembered" things, but retrieval was random. Important interactions got lost in noise. Context windows filled up fast. And there was no consolidation — just accumulation.

The core insight came from neuroscience papers on the hippocampus-neocortex loop. Human memory isn't a database. It's a multi-stage system where information is transformed as it moves from short-term to long-term storage. Sleep plays a crucial role in this transformation.

What if we built that?

Architecture Evolution

Luna went through several architectural iterations:

v0.2: Vector Store

Added embeddings and vector search. Conversations stored as vectors, retrieved by semantic similarity. Better than text files, but still missed the point. Vector similarity isn't the same as relevance.

v0.3: Two-Tier Memory

Split memory into "recent" and "historical." Recent memories in context, historical retrieved as needed. Closer, but the boundary was arbitrary. When does "recent" become "historical"?

v0.4: Three-Tier with Basic Decay

Introduced working memory, episodic memory, and semantic memory as distinct systems. Added time-based decay. Now we had structure, but no active consolidation.

v1.0: Full Consolidation

Added NeuralSleep — offline processing that moves information between tiers, extracts patterns, and prunes noise. This was the breakthrough.

The Technical Stack

Current Luna architecture:

LLM Backend

Groq (Llama 3.1 70B) for fast inference. Claude for complex reasoning tasks.

Memory Store

PostgreSQL + pgvector for episodic memory. Redis for working memory cache.

Knowledge Graph

PostgreSQL with JSONB (simpler than Neo4j for solo dev).

Consolidation

NeuralSleep running as scheduled jobs via node-cron.

Frontend

React + TypeScript. Clean chat interface with memory visualization.

Infrastructure

Single VPS, Docker Compose. Nothing fancy.

Key Implementation Details

Memory Encoding

When a conversation ends, it's processed into episodic memories:

async function encodeConversation(messages) {

// Chunk conversation into meaningful segments

const chunks = await chunkConversation(messages);

for (const chunk of chunks) {

// Generate embedding

const embedding = await embed(chunk.content);

// Extract metadata via LLM

const metadata = await extractMetadata(chunk);

// Store in episodic memory

await episodicStore.insert({

content: chunk.content,

embedding,

...metadata,

timestamp: new Date(),

decayFactor: 1.0

});

}

The metadata extraction is crucial. We use the LLM itself to identify topics, emotional weight, and relevance markers. This takes time but pays off in retrieval quality.

Retrieval Pipeline

On new message, the retrieval pipeline:

Load semantic context — User traits, preferences, known expertise
Hybrid episodic search — Vector similarity + recency + topic matching
Rerank results — Score by contextual relevance
Build context — Assemble working memory within token budget

The reranking step matters. Raw vector similarity often returns semantically similar but contextually irrelevant results. We use a small reranker model to score results against the current query.

Sleep Cycles

NeuralSleep runs every 6 hours (configurable). The process:

async function runSleepCycle() {

// Phase 1: Replay recent memories

const recent = await getRecentEpisodes(24 * 60 * 60);

// Phase 2: Cluster and extract patterns

const clusters = await clusterEpisodes(recent);

const patterns = await extractPatterns(clusters);

// Phase 3: Update semantic memory

for (const pattern of patterns) {

await updateSemanticMemory(pattern);

}

// Phase 4: Apply decay

await applyTemporalDecay();

// Phase 5: Prune low-value memories

await pruneWeakMemories(threshold: 0.1);

}

Lessons Learned

Building Luna taught me several things:

1. Simple beats clever

My early architectures were over-engineered. The final system is simpler than v0.3 in some ways. PostgreSQL with pgvector handles 90% of use cases. You don't need a graph database until you really need a graph database.

2. Consolidation is the key insight

The difference between "AI with memory" and "AI that remembers" is consolidation. Without active processing, you just have a log. With consolidation, you have knowledge.

3. Decay is a feature

Early versions tried to remember everything. The system drowned in noise. Implementing proper decay — with different rates for different memory types — made retrieval dramatically better.

4. Metadata matters more than embeddings

Vector search is necessary but not sufficient. Rich metadata (topics, emotional weight, context markers) enables sophisticated retrieval that pure semantic similarity can't achieve.

5. Solo dev is possible

You don't need a team to build meaningful AI systems. The constraints of solo development force simplicity and focus. Luna is built by one person, running on one server, serving real users.

Current State

Luna is live and in beta. Key metrics:

Memory persistence: Conversations from months ago still influence responses
Consolidation efficiency: ~40% reduction in stored data while maintaining recall
User experience: Conversations feel continuous, not episodic

The system isn't perfect. Consolidation sometimes over-generalizes. Retrieval occasionally misses relevant context. Edge cases abound. But it works — and it proves the architecture is viable.

What's Next

Current development focus:

Multi-user memory: Shared knowledge vs. personal memories
Richer knowledge graphs: Better relationship modeling in semantic memory
Adaptive consolidation: Learning optimal decay rates per user
Open source components: MemoryCore and NeuralSleep are available now

Try It Yourself

If you want to build something similar:

Start with the three-tier architecture
Implement basic episodic storage with embeddings
Add semantic memory as a simple key-value store initially
Build consolidation gradually — start with decay, add pattern extraction later
Test with real conversations, iterate based on retrieval quality

The code is open source. The architecture is documented. The hard part is already done.

AI that remembers isn't science fiction. It's a few hundred lines of code and the willingness to think differently about memory.

MemoryCore on GitHub → NeuralSleep on GitHub →