The Problem Everyone Hits
If you're running a local AI model on your own hardware — Llama, Qwen, Gemma, anything under 30 billion parameters — you've probably hit this wall: you give it a long system prompt full of facts about yourself, and it still can't remember your name.
I've been building a local AI assistant called Jarvis. It runs entirely on my PC, controls my smart home, opens apps, plays music. But every time I asked it something personal — "What's my wife's name?" or "What projects am I working on?" — it would say "I don't have that information."
I tried four different models. I restructured the system prompt. I added explicit "NOT" statements. Nothing worked reliably. Small models just don't process long system prompts the way larger cloud models do.
The Fix: Go Around the Model
Instead of making the model smarter, I went around it. I built a knowledge layer that sits between the user and the AI model. It intercepts questions it can answer directly — without ever touching the model.
The result: instant, accurate answers for personal facts, project status, and file lookups. The model only handles what it's actually good at — tool calls like controlling smart home devices, running commands, and general conversation.
How It Works
The knowledge layer has three pieces:
1. Local Memory Database
I use a project called MemPalace — a searchable database built on ChromaDB and SQLite. It stores everything about me: family, home, tools, goals, projects. Over 100 entries across 14 categories. When Jarvis starts up, it loads all of this into a fast in-memory cache.
2. Keyword Matching
When you ask a question, the knowledge layer checks if it matches known patterns before the model ever sees it. "What is my wife's name?" matches the family pattern, looks up the answer in the cache, and returns it instantly. No AI inference needed.
3. Live Data from APIs
For things that change — like active projects or content pipeline status — the knowledge layer pulls directly from my Command Center app via its API. So when I ask "What should I work on today?", I get real-time data from my actual project tracker, not a hallucinated answer from a model.
What the Model Still Does
The local AI model isn't useless. It still handles everything the knowledge layer can't:
- Smart home control — "Turn on the workshop lights" requires the model to parse intent and call the right Home Assistant service
- PC control — opening apps, running commands, taking screenshots
- General conversation — questions that aren't about stored facts
- Web search — anything requiring real-time information
The key insight: let each part do what it's good at. Databases are good at storing and retrieving facts. Models are good at understanding intent and calling tools. Don't ask one to do the other's job.
Keeping It Updated
The memory refreshes automatically every five minutes. There's also a manual refresh button (a brain icon in the UI) and a voice command — just say "refresh your memory." When I work in Claude Code and update facts in MemPalace, Jarvis picks them up on the next refresh. No restart needed.
The Hardware
This all runs on a single PC:
- Intel i9-12900K
- NVIDIA RTX 3060 12GB
- 64GB RAM
- Ollama for the local model (qwen3:14b or gemma4)
You don't need a $2,000 GPU. A 12GB card is enough to run 14B parameter models alongside the knowledge layer. The knowledge layer itself uses almost no resources — it's just a database lookup.
Software & Tools Mentioned
Claude Code
AI coding assistant from Anthropic. Used to build the knowledge layer, Jarvis, and everything else in this project.
Visit Claude →Ollama
Run large language models locally. Powers Jarvis's AI brain with models like qwen3 and gemma4.
Visit Ollama →Home Assistant
Open-source smart home platform. Jarvis controls lights, sensors, and speakers through its API.
Visit Home Assistant →MemPalace
Searchable memory database for AI agents. Stores facts in ChromaDB with semantic search.
View on PyPI →Check out my full gear and software list on the Tools & Gear page.
Video Coming Soon
I'll be recording a full walkthrough showing this in action — from the failing model to the instant knowledge layer. Subscribe so you don't miss it.
Subscribe on YouTube