Ingesting Institutional Knowledge

The open question

Every team knowledge system starts with the same problem: most of what it needs to know lives in people's heads, and the people don't have time to write it down. If I could interview every employee for three hours and shadow them for a week, the initial knowledge layer would populate itself. In practice I can't, and neither can anyone else building a team KMS for a working company.

So the question becomes: how much institutional knowledge can be extracted from the artifacts people already produce, and what remains stubbornly tacit?

Current thinking

Three classes of knowledge, three ingestion strategies

Not all institutional knowledge is the same shape. Treating it as one substance is the first mistake.

Explicit, externalized knowledge already lives in Slack, email, Google Docs, meeting transcripts, Linear tickets, code commits. Ingestion here is the best-understood part: connectors read the sources, the bouncer classifies what's worth keeping, the promoter compiles and routes. This class fills maybe 40% of a team's operational knowledge: enough to be useful, not enough to replace the people who made it.

Semi-explicit knowledge lives in the structure of how people work, not in what they write. The fact that the customer-success lead always cc's the finance controller on renewals over €50K. The fact that every new engineering hire shadows a specific senior for their first two weeks. The fact that strategic decisions get rehearsed in 1:1s before they reach the executive meeting. This class is extractable from the metadata of artifacts (who emails whom, who attends which meetings, who reviews whose PRs) more than from the content of any single artifact. It requires a bouncer that reasons about patterns, not just content.

Tacit knowledge is the judgement calls, the relationship dynamics, the "here's what you do when X happens" that only emerge when X happens. This is the class that shadowing would capture. No amount of artifact-reading reaches it directly. The best extraction method I've found is not extraction at all: it's a prompt that triggers articulation, an agent asking a senior employee specific, well-scoped questions at the moment a relevant decision is happening, capturing the response as structured reasoning rather than an artifact.

The coverage gap

The honest answer on coverage: artifact-based ingestion produces maybe 50-60% of what a full shadowing pass would produce, at 5% of the cost. The question is whether the remaining 40-50% matters for what the KMS is being built to do.

For operational questions ("what did we decide about X", "who owns Y", "what's the history of this customer") artifact ingestion is sufficient and maybe over-sufficient. For judgement questions ("how do we handle a customer in this state", "what would the senior person here say about this trade-off") artifact ingestion reaches only the surface.

The agent-assisted interview pattern

The most promising bridge I've found is not asking humans to sit for a three-hour interview, but having the agent ask three-minute questions repeatedly, in context, when the question is relevant. "I notice you're about to approve this spend. Do you have a rule for when this type of approval goes to the CFO rather than you? Would you mind capturing it while it's fresh?" This trades the interviewer's time for the employee's attention over weeks, at a lower aggregate cost, and the captured knowledge is grounded in real decisions rather than hypotheticals.

The discipline is that this pattern only works if the agent can also not ask. If every decision triggers a question, people stop responding. The bouncer-equivalent for interview prompts is as important as the bouncer for artifact ingestion.

What I haven't figured out

The retrieval question: once institutional knowledge is captured, how do you surface it at the right moment? A captured judgement from six months ago is only useful if it reaches the right person when a similar decision is pending. This is a recommender problem, not a storage problem, and I don't have a pattern I trust for it yet.

The attribution question: when an agent extracts semi-explicit knowledge from metadata ("the customer-success lead always cc's finance on large renewals"), whose knowledge is it? The employee whose behavior was observed? The agent that observed it? The company? This matters for privacy, for authorship marking, and for whether the employee can correct or redact the extracted pattern.

What would settle it

A working deployment where artifact ingestion + agent-assisted interview + metadata pattern extraction has been running for 12 months in a real company, with measured retrieval value against operational questions and judgement questions separately. The shape of the coverage gap would be visible. I have this running at small scale in two deployments; neither has produced enough history yet to answer the question well.