Local vs. Hosted Models

There is no single right answer here. The question of whether the LLM running against a knowledge system should be local or hosted splits cleanly across tiers, and the default I ship with differs in each.

Personal tier

Most of what a personal KMS compiles is not sensitive. A compilation pass over a week of meeting notes, research captures, and half-formed ideas has no meaningful leak surface. For this content, hosted models with sensible defaults are fine. Anthropic's no-training, no-retention account-level settings are the baseline I assume, and the capability gap between hosted frontier models and anything that runs locally is material. Long-context compilation in particular is a capability local small models cannot yet match.

For the minority of personal content that is meaningfully sensitive (financial records, legal matters, health information), local-only compilation makes sense, and the tooling exists (Ollama with reasonable models, llama.cpp, a handful of thin harnesses). The capability gap is real but acceptable, because the work on this content is less about reasoning quality and more about simple extraction and routing.

The personal-tier position is therefore hybrid: hosted as the default, local as a per-entity opt-in for flagged-sensitive content. The routing happens at the bouncer, not at query time, because local-run compilation is slower and you don't want to pay that tax on every ingest.

Team tier

The team tier is where hosted wins on pragmatic grounds. The reasoning workload at team scale (compilation across hundreds of concepts, cross-entity contradiction checks, long-context review) is where the capability gap between hosted frontier models and anything local-runnable is largest. Local deployment at this scale also creates an operational burden the team itself has to absorb.

The team-tier position I ship is: hosted models, but with strict contractual and configuration controls.

The API account has training disabled at the account level. Verifiable in the account configuration; renewed and re-verified annually.
The deployment is single-tenant: dedicated database, dedicated compute, dedicated credentials. No shared multi-tenant stores exist where a misconfigured policy could leak between clients.
The model never sees content the querying user cannot access (see filter-first-reason-second).
The vendor surface is explicitly listed and scoped. The reasoning vendor (Anthropic) sees extracted facts and queries, not raw source data. The database vendor holds the knowledge layer (it is the knowledge layer's database). The hosting vendor sees encrypted traffic. Background-job orchestration sees job metadata. Every one of those has a data-processing agreement; the vendor list is published, not a trade secret.

That combination (training disabled, single-tenant, filter-first, explicit vendor surface) is the default I ship at team scale. It is not the same as local-only. It is a considered bet that the legal and operational controls are strong enough to make hosted the better choice given the capability delta.

Regulated or high-sensitivity

When regulation or explicit contract requires it (financial data under certain regimes, EU AI Act enterprise provisions, specific client security postures), hosted-in-our-environment is not the right answer, even with all the controls above. The right answer is hosted inside the client's own cloud.

Claude is available through AWS Bedrock and GCP Vertex AI, both of which are covered under existing enterprise cloud agreements. In that configuration the sensitive content (regulated records, confidential client data) never leaves the client's security perimeter. The AI reasoning happens inside the client's boundary. This is the configuration I default to for any deployment where the data type would otherwise require a separate data-processing agreement with a model vendor, and it is the one I use for regulated-industry engagements.

For the handful of cases where even client-hosted is not enough (air-gapped environments, government classifications, specific adversarial threat models), the answer is local-only on the client's own infrastructure, accepting the capability gap. I have shipped this once; it is expensive to operate and I would not recommend it unless the threat model specifically requires it.

Caveats

This is the page where my thinking is most likely to move. The hosted-model market is repricing its trust primitives quickly: zero-retention endpoints, private deployments, confidential-compute execution environments. In twelve months the right default may look different.

The position that holds across all cases, regardless of where the model runs: the filter-first invariant (filter-first-reason-second). The model layer changes; the invariant does not. Anything that moves sensitive content into the model's context is a leak surface; anything that doesn't, isn't. Pick the deployment that makes that separation cleanest for the data in question.