Self-hosted software needs licensing that respects the user's autonomy. We bind a licence to a hardware fingerprint hash, validate periodically, and ship a 60-day offline grace window so a vendor outage never punishes a paying customer. The fingerprint, the rebind policy, and what we deliberately left out.
Tweaking an agent's behaviour by clicking into a config screen feels wrong once the agent is sitting on a chat with you. KULVEX detects imperative messages from the owner, converts them to behavioural rules via an external critic, and applies them in 30 seconds. With an undo. The pattern, the safety, and the demo.
Switching KULVEX's chat brain from a 31B dense Gemma to a 35B-A3B MoE model. The math on activated parameters, the tensor split across 4090+5090, KV-cache q8_0, 262K context per slot, and 200 tok/s of decode. Configs, benchmarks, and what broke along the way.
Prompt rules are advisory text the model interprets narrowly. A deterministic post-LLM filter catches the leaks rules miss — chain-of-thought, system markers, fake transcripts, repetition loops. Pair it with the rules and the two systems compose. The patterns, the regex, and the failure modes.
Home Assistant is great hardware support glued to a YAML automation engine. KULVEX takes a different shape: a local LLM that owns Zigbee, Z-Wave, Tuya, and weather hubs directly — no middleman, no YAML, intent-driven control. Architecture, latency numbers, and what HA still does better.
Chat agents drift. They leak reasoning, fall into formulaic phrases, develop bad habits over weeks. Most teams fix this by hand-editing prompts. KULVEX uses an external-LLM critic that watches output, writes surgical rules within seconds, and weekly distils them back into the prompt itself.
Most 'private AI' products still phone home — telemetry, analytics, cloud backups you can't disable. KULVEX takes a different stance: every byte of conversation, memory, and config stays on your server, with one explicit opt-in cloud loop you can audit. The architecture, the egress map, the honest tradeoffs.
Static analyzers drown you in false positives. LLM-first tools burn tokens and hallucinate bugs that don't exist. We split the problem: a deterministic scanner pre-filters, a small local LLM verifies each candidate in isolation. Here's the architecture, the NASA/IDF numbers, and why local matters.
KULVEX is the commercial self-hosted AI platform. KCode is the open-source (Apache 2.0) AI coding CLI by the same team. They share a philosophy — local-first, privacy-first, deterministic where possible — and this blog covers both.