KULVEXApril 26, 2026·6 min read

Owner directives: editing your AI agent by chatting at it

You're in a Signal group with one of your agents. It keeps using the word "bruh". By the fourth time in an hour it's annoying. The fix in most stacks: open the dashboard, find the agent, scroll to the system prompt, find the right place to add a rule, save, wait for the runtime to reload. Or, if your stack is chattier: three different toggles and a tooltip that lies.

Here's a different shape: just send the agent a message. "deja de usar 'bruh' tanto". The agent replies "got it, easing up on bruh". Thirty seconds later that becomes a real rule in its system prompt — applied automatically, visible in the dashboard, revertible with one word.

That's an Owner Directive. It ships in KULVEX. This post is the design — what we built, what we deliberately didn't, and the safety boundary that makes it usable.

What it looks like end-to-end

Signal DM (you → Reed):
   "quiero que seas un poco más serio cuando respondes
    preguntas técnicas"

[T+0]   Message arrives at KULVEX runtime.
        Sender phone matches Reed.owner_phone → is_owner=True.
[T+0.1] Cheap heuristic: "quiero que" matches imperative-verb list.
[T+0.2] Hand off to the correction engine's directive interpreter
        (xAI Grok in this install).
[T+12]  Critic returns:
          is_directive: true
          rule: "Be more serious and professional when answering
                 technical questions."
          summary: "entendido, seré más serio en respuestas técnicas"
[T+12.1] Rule appended to Reed's learned_rules.
         Confirmation message rendered.

Signal DM (Reed → you):
   "📝 entendido, seré más serio en respuestas técnicas
    (rule applied — reply 'undo' within 10 min to revert)"

[next message] Reed's system prompt now contains the new rule.
               The rule applies from this turn forward.

One round trip, owner stays in the chat, agent is measurably different on the very next reply.

How the loop is wired

The intercept happens at the top of the agent runtime, before the model is invoked at all. Three checks gate it:

Sender check. is_owner=True must hold. The sender's phone (or session) must exactly match the agent's configured owner_phone. No other user can give an agent a directive — they get normal-chat behaviour.
Cheap heuristic. A regex over imperative markers in Spanish and English ("quiero que", "deja de", "sé más", "stop", "be more", "always", "never", "from now on"...). Pre-filter is required because we don't want to burn an external API call on every owner message.
LLM classifier. If the heuristic passes, we send the message to the configured external critic with a directive-extractor prompt. It returns is_directive + a rule + a confirmation summary. The LLM is allowed to refuse: if the message looks imperative but isn't actually a behavioural directive ("quiero que sepas que tu primo viene mañana"), the critic returns is_directive=false and we fall through to normal chat handling.

Three gates, each defensive. False positives cost an API call (~$0.02). False negatives mean the user has to rephrase. Neither breaks the chat.

What does and doesn't trigger

HeuristicLLM verdict"hola, qué onda?"no match— (skipped)"quiero que seas más activo"matchdirective ✓"stop using bruh"matchdirective ✓"quiero que sepas que mañana viene mi primo"matchNOT a directive ✗"💀💀"no match— (skipped)"be more concise from now on"matchdirective ✓

The pre-filter catches the imperative tone; the LLM decides whether the imperative is actually about behaviour (vs. informational). The combination is both cheap and accurate.

Undo: ten minutes, one word

Every directive comes with a built-in regret window. If the next owner message within 10 minutes is undo, revert, or deshacer, the most-recently applied directive-sourced rule is pulled from the agent's prompt and the proposal is marked reverted in the audit log.

you   → Reed:  "deja de usar bruh"
Reed → you:    "📝 ok, voy a usar 'bruh' a lo sumo cada 10 mensajes
                (rule applied — reply 'undo' within 10 min to revert)"

you   → Reed:  "undo"
Reed → you:    "↩️  ok, deshecho: 'Avoid the word bruh — use it at
                most once every 10 messages'"

After the window closes, the rule has to be removed from the dashboard like any other learned rule. The window is short enough that mistakes are caught immediately and long enough that you don't need to be at the keyboard the second after you sent it.

Why this is safe to ship

A feature that lets a chat message rewrite an agent's behaviour sounds risky. The boundaries that make it not:

Owner-only. Hard sender check against the agent's configured owner phone. Strangers in a group chat can't reshape the agent. Even other friends in your DMs can't — only your number.
Per-agent feature toggle. The whole flow is gated by external_learning_enabled on the agent. If you don't want this agent to accept chat directives, you turn that off.
External LLM is the gatekeeper. The cheap heuristic only proposes; the LLM classifier decides. We pass it the message and ask: "is this an actual behavioural directive, or just a sentence with an imperative verb?" Most ambiguous cases get filtered out at this step.
Rules go through the same pipeline as auto-corrections. Directive-sourced rules are persisted in the same correction_proposals collection as every other rule, with the source tagged owner_directive. They show up in the activity feed, count toward consolidation, and can be reverted from the dashboard. No special path that bypasses oversight.
Length cap. Messages over 500 chars are treated as conversation, not directives — directives are imperatives, not essays.

Why this matters past the demo

The interesting part is the shift in interaction model. Most AI agents today have two surfaces:

The chat surface (where you talk to them)
The config surface (where you configure them)

And the second surface is, almost always, completely different software — a JSON editor, a YAML file, a dashboard. You context-switch out of the conversation to change how the conversation works. That break is what makes most AI agents feel like tools rather than collaborators.

Owner directives collapse the two. The chat surface is the config surface. Tell the agent to be different and it's different. The dashboard still exists for everything else (tools, channels, structural settings) — but the day-to-day "I want this to behave a little differently" tweaks happen in the conversation itself, where the friction was.

That's the part that makes agents feel less static. Less like a tool you set up once and use as-is, more like something that genuinely adapts to how you want to work with it.

What this doesn't solve

Vague directives produce vague rules. "Be cooler" — the LLM extracts something, but it'll be a rule the underlying model interprets loosely. Specific directives produce specific rules.
Conflict between directives. Send "be more verbose" on Tuesday and "be more concise" on Friday and you have a latent contradiction in the prompt. The weekly consolidation step resolves these in favour of the most-recent rule, but there's a window where the prompt holds both.
Costs an API call per directive. ~$0.01- $0.03 per accepted directive. If you give your agents twenty directives a day, that's a budget item. Most owners give them fewer than five.

Try it on your own agent

# Install KULVEX
curl -fsSL https://kulvex.ai/install.sh | bash

# In the dashboard:
#   Agents → Create
#   set owner_phone to your number
#   toggle "External learning" ON   (warning explains the cost)
#
# Then DM the agent from your number:
#   "be more concise"
#   "stop using technical jargon"
#   "siempre respondé en español"
#
# Reply "undo" within 10 min to revert.

For the architecture this sits on top of, see the companion posts on the correction engine and private-by-default. Owner directives reuse all of that machinery — same critic, same audit log, same budget cap. The chat surface is the new piece.

What we're looking for

Edge cases. Particularly: imperative-sounding messages that are actually informational and shouldn't trigger, and informal directives that should but don't. Each one tunes the heuristic / classifier tradeoff.

— [email protected]