Primer — senye.me

The principles describe KMS architecture in Claude-native terms. This page covers the generic Claude primitives the principles assume — useful as a primer if you are earlier on the curve, useful as a reference even if you aren't.

Subagents

A subagent is a Claude instance spawned by another Claude instance, with its own context window and its own tool access. Subagents are the primitive for parallel work: ripple sweeps across many entities, contradiction checks, batch ingest. The cost is the parent context loses fidelity into what the child did — so subagent prompts should be self-contained briefings and child results should be summaries, not raw output.

Use subagents when work is parallelisable and the parent doesn't need the intermediate state. Use the parent directly when the work is sequential or when intermediate state matters.

Skills

A skill is a callable, named methodology — markdown file describing what it does, when to invoke, and how to do it. The point is reusability across sessions: the same way to ingest a meeting note, the same way to run a check-in, the same way to extract entities. Skills make a KMS's behaviour stable across time.

The discipline that matters: skills should declare their inputs, their side effects, and the files they touch. Implicit skills (where the agent has to guess what they do) drift into unreliability.

CLAUDE.md

The project-level instruction file. Every working KMS I've built has a CLAUDE.md that declares: the routing table, the entity schemas, the rules per layer, the things never to do (e.g. "never overwrite human-authored files without confirmation"). It is the only durable place the agent reads on every session — so it has to be tight, specific, and actionable.

The failure mode is CLAUDE.md sprawl: dozens of bullet points the agent can't actually internalise. Keep it short. Move long explanation to skills.

Memory

Claude has session memory (within a conversation, automatic), and it has persistent memory if you give it a place to write — typically a memory directory in the project. The KMS uses persistent memory for what's important to keep across sessions but not important enough to file as durable knowledge: user preferences, recent feedback, ongoing context.

Don't confuse memory with the knowledge store. Memory is for the agent's continuity; the knowledge store is for the user's knowledge. They share neither shape nor lifecycle.

Hooks

Hooks fire on harness events — UserPromptSubmit, PreToolUse, PostToolUse, Stop. They run shell commands; they can block, modify, or augment what the agent does. For a KMS:

PreToolUse on Edit/Write — enforce authorship rules ("can't overwrite a human-authored file").
PostToolUse on Write — trigger a compilation pass or a ripple sweep.
Stop — run a session-summary skill to capture what mattered before context evaporates.
UserPromptSubmit — pre-classify input, attach context, route silently.

Hooks are the harness equivalent of database triggers. Use them for invariants the agent shouldn't have to remember to enforce.

MCP

Model Context Protocol — a way to expose external tools, data sources, and prompts to Claude through a standard interface. For a KMS, MCP is how you connect the agent to things outside the markdown vault: the warehouse, an integration (Slack, Gmail, Linear), or another KMS tier.

The architectural choice: which knowledge stays local (markdown, on disk) and which gets reached through MCP (remote, on demand). My current bias: durable knowledge stays local; query-on-demand state and high-volume structured data go through MCP.

Long context (1M)

Sonnet 4.6 and Opus 4.7 ship with a 1M-token context window. For a KMS this changes what's economical:

Whole-vault re-reads on demand.
Compilation passes that consider all prior knowledge in one go, not chunked.
Cross-entity contradiction checks without a separate retrieval step.

The cost: 1M context is slower and more expensive per request. The discipline: use long context for compilation-time reasoning (sweeps, contradiction checks) and shorter context for interactive reasoning (live conversations, query layer).

Prompt caching

Cache the static parts of a prompt — CLAUDE.md, skill descriptions, schema declarations — so they aren't re-billed on every turn. The cache TTL is 5 minutes. The practical implication: long sessions amortise the cache cost; one-shot calls don't benefit. Structure your KMS interactions to favour long sessions when expensive context is loaded.

Fast mode

Claude Code's fast mode swaps Opus 4.6 for a faster-output configuration of the same model. For KMS work the trade-off is: fast mode is great for ingest, capture, and routing (lots of small ops); regular mode is better for synthesis and contradiction reasoning (fewer, deeper passes).

Tool selection

The agent's first move on most tasks is picking the right tool. The wrong tool selection is a leading cause of bad KMS behaviour — using grep when you wanted a structured query, using Read when you wanted Glob, using Bash when there's a dedicated tool. CLAUDE.md should declare the tool preferences explicitly so the agent doesn't have to re-derive them every session.

Context window management

Long sessions with a KMS will fill the context. The discipline: off-load early, off-load often. When a session has produced something durable, write it to the store and let the rest fall out of the window. The session memory is scratch; the store is the artifact.

The harness will compress the conversation as you approach the limit; relying on that is fine for personal use but unsuitable for team-tier work where you want explicit control over what's preserved.