I Made My Local AI Actually Remember Me

The Problem Everyone Hits

If you're running a local AI model on your own hardware — Llama, Qwen, Gemma, anything under 30 billion parameters — you've probably hit this wall: you give it a long system prompt full of facts about yourself, and it still can't remember your name.

I've been building a local AI assistant called Jarvis. It runs entirely on my PC, controls my smart home, opens apps, plays music. But every time I asked it something personal — "What's my wife's name?" or "What projects am I working on?" — it would say "I don't have that information."

I tried four different models. I restructured the system prompt. I added explicit "NOT" statements. Nothing worked reliably. Small models just don't process long system prompts the way larger cloud models do.

The Fix: Go Around the Model

Instead of making the model smarter, I went around it. I built a knowledge layer that sits between the user and the AI model. It intercepts questions it can answer directly — without ever touching the model.

The result: instant, accurate answers for personal facts, project status, and file lookups. The model only handles what it's actually good at — tool calls like controlling smart home devices, running commands, and general conversation.

How It Works

The knowledge layer has three pieces:

1. Local Memory Database

I use a project called MemPalace — a searchable database built on ChromaDB and SQLite. It stores everything about me: family, home, tools, goals, projects. Over 100 entries across 14 categories. When Jarvis starts up, it loads all of this into a fast in-memory cache.

2. Keyword Matching

When you ask a question, the knowledge layer checks if it matches known patterns before the model ever sees it. "What is my wife's name?" matches the family pattern, looks up the answer in the cache, and returns it instantly. No AI inference needed.

3. Live Data from APIs

For things that change — like active projects or content pipeline status — the knowledge layer pulls directly from my Command Center app via its API. So when I ask "What should I work on today?", I get real-time data from my actual project tracker, not a hallucinated answer from a model.

What the Model Still Does

The local AI model isn't useless. It still handles everything the knowledge layer can't:

Smart home control — "Turn on the workshop lights" requires the model to parse intent and call the right Home Assistant service
PC control — opening apps, running commands, taking screenshots
General conversation — questions that aren't about stored facts
Web search — anything requiring real-time information

The key insight: let each part do what it's good at. Databases are good at storing and retrieving facts. Models are good at understanding intent and calling tools. Don't ask one to do the other's job.

Keeping It Updated

The memory refreshes automatically every five minutes. There's also a manual refresh button (a brain icon in the UI) and a voice command — just say "refresh your memory." When I work in Claude Code and update facts in MemPalace, Jarvis picks them up on the next refresh. No restart needed.

The Hardware

This all runs on a single PC:

Intel i9-12900K
NVIDIA RTX 3060 12GB
64GB RAM
Ollama for the local model (qwen3:14b or gemma4)

You don't need a $2,000 GPU. A 12GB card is enough to run 14B parameter models alongside the knowledge layer. The knowledge layer itself uses almost no resources — it's just a database lookup.

Software & Tools Mentioned

Claude Code

AI coding assistant from Anthropic. Used to build the knowledge layer, Jarvis, and everything else in this project.

Visit Claude →

Ollama

Run large language models locally. Powers Jarvis's AI brain with models like qwen3 and gemma4.

Visit Ollama →

Home Assistant

Open-source smart home platform. Jarvis controls lights, sensors, and speakers through its API.

Visit Home Assistant →

MemPalace

Searchable memory database for AI agents. Stores facts in ChromaDB with semantic search.

View on PyPI →

Check out my full gear and software list on the Tools & Gear page.

Full Build Coming Soon

The Short up top is the 60-second recap. I'll be recording a full walkthrough showing this in action — from the failing model to the instant knowledge layer. Subscribe so you don't miss it.

Subscribe on YouTube

← Welcome to Everyday AI Murmur: Free Wispr Flow Alternative →