Private-by-default: how KULVEX keeps your AI conversations on your server
Almost every "private AI" product we've looked at has the same gap: it ships with a privacy badge in the marketing, and a second outbound connection in strace. Telemetry. Analytics. "Anonymous usage stats". A cloud backup you can't turn off. KULVEX takes a different stance — every byte of conversation, memory, and configuration stays on your machine, and the one cloud feature that could change that is off by default, opt-in per agent, and shows you the tradeoff in plain language before you flip it.
This post is the engineering case for that stance: where data lives, what calls leave the box (and which ones don't), how we built the optional cloud loop without compromising the default, and the limits we're honest about.
What "private AI" actually looks like in practice
Three patterns dominate the "privacy-first AI" market in 2026, and each has a quiet caveat:
- Hosted-but-encrypted. Your data is encrypted at rest, but inference happens on the vendor's GPUs. Translation: they hold the keys to decrypt for inference. The encryption protects against database leaks, not against the vendor.
- Self-hosted SaaS. The vendor ships a Docker image you run on your own VPS. Most of these still pull models from a cloud registry, send error reports to Sentry, and occasionally call home for licence checks that include usage metadata.
- Local-LLM apps. The model runs in-process and never sees the cloud — but the surrounding app (the chat UI, the memory store, the integrations) often does. We've seen "100% local" AI assistants that ship every chat to PostHog for product analytics.
None of these are fraud — they're tradeoffs the user didn't see. The fix isn't a stronger marketing claim; it's an architecture where the privacy-relevant calls are visible, opt-in, and few.
What KULVEX is, structurally
KULVEX is a single server you install on your own hardware. On install it spins up:
- Local model inference — a llama.cpp server hosting the chat model (Mark VII / Qwen3.6-A3B-Heretic on dual-GPU rigs by default; smaller models on lighter hardware). All token generation happens on-box.
- Local databases — MongoDB for agents, conversations, settings; SQLite for the long-term memory system (mempalace) with vector embeddings; SQLite for tool audit logs.
- Local channel listeners — Signal-cli, the WhatsApp bridge, Telegram, and the web dashboard, all talking to the local API on
localhost:9000. - Local home and presence stack — Zigbee2MQTT, Z-Wave-JS, Tuya local API, Tempest UDP listener. KULVEX talks Zigbee/Z-Wave directly; there's no Home Assistant or other middleman.
The default deployment makes zero outbound calls related to your conversations or your home. The only outbound traffic on a stock install is the licence heartbeat, which we describe below and which can be deferred to a 60-day offline grace window.
YOUR SERVER (Linux/macOS, your hardware)
┌───────────────────────────────────────────────────────┐
│ │
│ Web UI ───┐ │
│ \ │
│ Signal ────┼──── KULVEX API (localhost:9000) ────┐ │
│ / │ \ │
│ Telegram ─┘ │ ▼ │
│ │ ┌────────────┐
│ Tempest UDP ──── home_manager ─────┤ │ MongoDB + │
│ Zigbee2MQTT ◄──┘ │ │ SQLite + │
│ Z-Wave-JS ◄──┘ │ │ mempalace │
│ ▼ └────────────┘
│ ┌──────────────┐
│ │ llama-server │
│ │ (Mark VII) │
│ └──────────────┘
│ │
└───────────────────────────────────────────────────────┘
│
│ ONLY outbound flow on default install:
▼
╔══════════════════════╗
║ Licence validation ║ hash + key, no content
║ (every 60d, optional║
║ offline mode) ║
╚══════════════════════╝
What never leaves your server
Concretely, the following data is generated on your machine and persisted only to local storage. Nothing in this list is transmitted to AstroLexis (the company) or any third party unless you explicitly enable the optional cloud loop in the next section.
- Every chat with every agent — across every channel (web, Signal, WhatsApp, Telegram, voice).
- Agent definitions: prompts, learned rules, style profiles, tool selections.
- Long-term memory: mempalace conversation history with embeddings, semantic search index, retrieval logs.
- Voice clones, generated images, transcripts, file uploads.
- Smart home state, camera frames, presence detection events, YOLO inference outputs.
- Tool execution logs (every
channel_send_message, everyhome_turn_on, every file read). - Local model weights — KULVEX never streams models from a cloud registry at runtime.
Wipe the server, you wipe the data. There is no remote backup unless you set one up yourself (rsync, restic, Duplicity — KULVEX ships a backup module that writes to whatever target you point it at, including a USB stick).
Every outbound call, listed
Here's the complete egress map of a stock KULVEX install (no cloud features enabled):
POST /v1/heartbeat to AstroLexis licence server. Sends: licence key, hardware fingerprint hash, KULVEX version. Frequency: every 60 days by default. Can be deferred to a 60-day offline grace window.Stripe (purchases)Only on the public marketing site at checkout, not from the installed product. Stripe handles card data; we never touch it.Update checksOptional. Polls the GitHub releases of the open-source components (KCode binary releases). Only runs if you opt into auto-update.TelemetryNone.Analytics / trackingNone.Crash reportsLogs stay in /var/log/kulvex. Nothing is sent.That's the entire list. The licence call is the only mandatory one on a paid install, and the body is hashed — it doesn't carry chat content, agent names, or anything you'd care about.
The one optional cloud loop, and how we handle it
KULVEX has exactly one feature that, when activated, sends conversation data to an external provider: the Correction Engine (also called External Learning). The use case is concrete: agents drift, leak internal reasoning, develop bad habits over weeks of use. Hand-tuning the prompts gets old. Letting a stronger external model (xAI Grok, Anthropic Claude, or OpenAI GPT) review the agent's recent conversations and propose surgical rules fixes the drift in near real-time.
The privacy cost is real. Excerpts of conversations leave your server. We thought about every way to soften that — and ultimately decided the right design was not to soften it but to surface it.
The contract:
- OFF by default. Every agent ships with
external_learning_enabled = false. No call is made until you flip the switch. - Per-agent. The toggle is on each agent's edit form, not a global setting. You can enable it for casual agents (group-chat banter) and leave it off for sensitive ones (anything dealing with PII, finances, health).
- First-time warning. The first time you flip it on for any agent, KULVEX shows a confirmation dialog that names the provider, summarises what data leaves the box, and links to the relevant section of the Privacy Policy. You acknowledge once; the engine remembers.
- Visible activity feed. Every external call shows up in a live activity log in the dashboard (provider, model, latency, tokens, cost). No hidden calls.
- Hard budget cap. A monthly USD ceiling you set (default $50). When exceeded, the engine pauses. No surprise bills.
- Toggle off, traffic stops immediately. Past data already sent is subject to the provider's retention policy. Nothing new goes out.
What gets transmitted when the loop is on, exactly: up to 30 recent turns of the chat the agent participated in, the agent's system prompt, the agent name, and metadata about recent leaks the local runtime sanitiser caught (this last piece is what lets the cloud model write surgical rules). That's it. No tool definitions, no other agents' data, no smart-home state, no memory contents beyond what's in the audited conversation.
The runtime sanitiser, or: defence in depth
Even with the cloud loop off, agents can leak — language models love to publish their internal reasoning if the prompt is loose. We saw it in production: an agent publishing "The user (Bruno) is asking…" chain-of-thought verbatim into a private DM, including system markers injected by the runtime that should never have been visible.
The fix is a deterministic output filter that runs on every agent response before it ships to a channel. It strips:
<think>…</think>blocks- Lines starting with chain-of-thought markers ("The user is asking", "I need to…", "Let me think", "Plan: 1…", "Draft: A or B")
- Runtime markers like
[OWNER DM ...],[CRITICAL ...], and any other bracketed system instructions - Fake-transcript reconstructions ("Bruno: 'and?' / Reed: [Internal monologue]") — a particular failure mode of MoE thinking models
- Loops where the same line repeats more than twice in a row
If the filter strips the entire response, the message is suppressed rather than sent (better silence than a leak). Each sanitiser hit is logged locally and — if you have the Correction Engine enabled — fed back as evidence to the next cloud review, which writes a sharper rule against that exact pattern. The two systems compose: the filter catches what slips past the prompt; the cloud loop teaches the prompt to catch it next time.
The numbers: a real install, after a week
From a five-agent KULVEX install running for one week with the Correction Engine enabled on all agents (real numbers from our lab, not a benchmark):
94% of every conversation that happened on this install never left the server. The 6% that did was conversational excerpts auto-selected by the engine when an agent showed a failure pattern, and each one was traceable in the activity feed. That's the tradeoff in numbers — small egress, but non-zero, and visible.
What "private-by-default" does NOT mean
Three things we're direct about, because nothing destroys a privacy posture faster than a hidden caveat:
- Channel platforms still see channel traffic. When an agent receives a message via Signal, that message traversed Signal's servers (with their end-to-end encryption) before arriving at your KULVEX. WhatsApp, Telegram, etc. behave per their own policies. KULVEX doesn't change what those platforms see — only what we do once it lands.
- Your server's security is on you. If you run KULVEX on a public IP without a firewall, or with your auth credentials in a Git history, no architectural choice we made matters. The privacy story assumes a reasonably configured machine.
- Cloud-loop egress is real when enabled. We've given you the controls (per-agent toggle, budget cap, warning, activity feed) and an honest model for what gets sent. We can't — and don't — claim the data goes nowhere when you've told the system to send it somewhere.
How to verify all of this
We'd rather you don't take our word for it. After installing KULVEX, the easiest sanity check is to put the box behind a network monitor (tcpdump on your router, or a Pi-hole egress log) and watch what leaves. With cloud features off, you should see the licence heartbeat once per cycle and nothing else of substance.
# On your router or a packet capture sudo tcpdump -i any -n -s 0 \ 'host <your-kulvex-server-ip> and not (port 22 or port 80 or port 443)' \ -w kulvex-egress.pcap # Open in Wireshark, filter by destination IP — every external host # should be one of: kulvex.ai (licence), or whatever you explicitly # configured.
For a deeper read on every data category, who receives what and when, see the full Privacy Policy.
Try it on your hardware
KULVEX installs on Linux (recommended: 24+ GB VRAM for the default Mark VII model; smaller models pick automatically on lighter rigs) and macOS (M2/M3/M4 with 32+ GB unified memory). The installer is a single shell script:
# Linux / macOS — one-shot install curl -fsSL https://kulvex.ai/install.sh | bash # After install, the dashboard is at https://localhost:3000 # Cloud loop is OFF by default. Enable per-agent only when you mean it.
Pricing, hardware tiers, and the feature comparison are on kulvex.ai/pricing.
What we want to hear
If you spot an outbound call we didn't document, a path where data leaves the server without a clear opt-in, or a corner of the architecture that doesn't match what we've described in this post — we want to fix it. The privacy posture only works if it's actually true, and it stays true only if anyone running the product can audit it.