Full product campaigns can exceed 200K tokens. Neither pure RAG nor pure summarization solves this alone.

Context Accumulation Is the Hardest Problem

Every conversation with an AI model starts with a clean slate and a generous token budget. Ask it one question, get a good answer. Ask it twenty questions across a multi-week project, and you hit a wall that no amount of model improvement will fix.

The problem isn't intelligence. It's memory.

Where the limit actually bites

Consider a marketing campaign that runs through nine stages — market research, audience definition, messaging, content creation, review cycles, and launch. By stage five, the accumulated context from earlier stages exceeds most models' context windows. The AI forgets the brand voice decisions from stage two. It contradicts the positioning agreed on in stage three.

This isn't a hypothetical. It's the first thing that breaks when you try to build AI workflows that span more than a single conversation.

The instinct is to throw everything into the context window. Modern models accept 100K, 200K, even longer inputs. But token limits aren't the only constraint — cost scales linearly with context length, latency increases, and models perform worse when forced to attend to massive amounts of loosely relevant information.

Why the obvious solutions don't work alone

Pure RAG (retrieval-augmented generation) sounds elegant: embed everything, retrieve what's relevant, keep the context window small. But RAG retrieves fragments. It loses the narrative thread. When your AI needs to understand how a campaign evolved across stages — not just find a specific fact — semantic search returns puzzle pieces without the picture on the box.

Pure summarization is the other instinct: compress each stage into a summary, carry the summaries forward. This works until the summaries themselves accumulate. And every compression loses nuance. By stage seven, the summary of a summary of a summary has lost the specific constraint that matters for this particular decision.

Neither approach is wrong. Both are incomplete.

The pattern that's emerging

The most promising approach treats context like a filing system with three distinct tiers, each serving a different purpose:

Raw artifacts — the full, immutable outputs from each stage. These never get compressed or summarized. They sit in storage, available when you need to reference the exact wording of a brand guideline or the specific data point from a research phase.

Atomic facts — key decisions, constraints, and rules extracted from the artifacts. These are small, structured, and queryable. "The target audience is mid-market CFOs." "The brand voice is authoritative but not academic." Facts that should never be forgotten, regardless of how many stages pass.

Evolving summaries — compressed narratives of each stage that get refined as the campaign progresses. The stage-three summary might get rewritten after stage six reveals that a different angle is more important than initially thought.

The working memory for any given interaction combines the current task, relevant atomic facts, and recent summaries — all carefully assembled to fit within the token budget. Full artifacts get pulled in only when specifically referenced.

What this means for builders

If you're building AI workflows that span more than a single request-response cycle, context management is your hardest engineering problem. Not prompt engineering. Not model selection. Not UI design.

The temptation is to solve it later — get the happy path working first, worry about memory when things break. But context architecture shapes everything downstream: how you store intermediate results, how you structure your workflow stages, how you price your product.

The teams that figure this out first will build AI products that feel genuinely intelligent across long interactions. The ones that don't will build tools that work impressively for five minutes and then start forgetting.

Context isn't a feature. It's the foundation.