Trust

Filter First, Reason Second

The invariant. The model never sees content the querying user can't access — and every other trust control in a KMS flows from that one rule.

[ settled ]

The only position on this page that is not a working hypothesis. Every other trust decision in a knowledge system follows from this one, and if the architecture gets this wrong, no downstream control will save it.

The invariant

When a user queries a knowledge system, the order of operations has to be:

  1. Resolve the user's identity and scoped authorisations through the identity provider.
  2. Assemble the candidate set of knowledge that might answer the question.
  3. Remove from that set anything the user is not permitted to see.
  4. Hand the filtered set to the model.
  5. Compose the answer from the filtered set only.
  6. Return the answer with citations only to sources the user was allowed to see — and log the decisions.

The step that matters is step three. The model never receives content the user can't access.

Why this is the invariant

Three failure modes disappear once filter-first is the rule.

Prompt injection cannot exfiltrate what was never in context. A malicious prompt cannot coerce the model into revealing what the model does not have. The model cannot paraphrase, summarise, or clever-phrase its way around a permission it never saw.

Silent leakage through compilation disappears. An LLM given five sources at different sensitivities can, at the margin, reveal something about the most sensitive source in a summary of the least sensitive one. If the sensitive source was never in context, that failure is structurally impossible.

Honesty becomes available. When a query touches material the user is not cleared for, the system can acknowledge that fact without revealing its content: "There is additional context in finance that I can't share with your role. Talk to the finance lead if you need it." Users stop quietly distrusting a system whose blind spots are invisible — and start using the acknowledgement as a map to who to ask next.

The inverted order — compose an answer with everything relevant, then redact — fails all three. The model has already seen the content. Redaction is theatre.

Worked example

In the team KMS — the worked implementation of this thinking — the MCP layer is the enforcement point. A user's identity is resolved through the client's identity provider, their role and scoped assignments are determined, and the candidate concept set is filtered against those scopes before anything reaches the model. A salesperson asking a question whose best answer would touch a finance concept receives an answer built only from the concepts they can see, plus an honest acknowledgement that restricted material exists.

The same architecture scales across scoping models. Record-level scoping (the user can only see their own cases), department-level scoping, and cross-entity boundaries all collapse into the same architectural rule: authorisation is evaluated at candidate-set assembly, not at answer composition.

Caveats

Filter-first closes the primary leakage channel. It does not close side channels — the timing of a response, whether a response was possible at all, what the system declined to answer. A determined observer could, in theory, infer the existence of restricted material from the shape of what the system refuses to answer.

The mitigation today is to make refusal uniform — same message, same latency — across categories of restriction, so that the shape of the refusal does not leak what was behind it. This is close-enough at team scale. For deployments with adversarial or intelligence-driven threat models, further work is warranted, and this is the edge where the position is still moving.

Related positions

Rev. 2026-04-18