Factory

Frontiers

The open questions I have not answered. The review-bandwidth ceiling, where earned trust breaks, the pull between cost and trust, and whether the whole thesis holds. This page is living.

This is the one article in this section that is never finished. The rest describe a system I am building. This one records what the system has not taught me yet, the questions that stay open while it runs. I would rather publish the questions than pretend I have the answers, so this page is honest about being unsettled, and I will revise it whenever a real cycle turns one of these from a guess into something I actually know.

The review-bandwidth ceiling

The company scales the body, not the board. I can add departments, add projects, add automation, and every one of them eventually routes a decision to the same desk: mine. The factory works right up until my attention is the bottleneck, and then it stops working no matter how much capability sits underneath. The open question is whether better tooling raises the ceiling or just delays hitting it. Sharper decision cards and fewer of them might let one person rule on a genuinely larger portfolio. Or they might only postpone the moment the volume wins. I do not know where the line is, and the scale of the company I can run depends entirely on the answer.

Where earned trust breaks

Autonomy in this system is earned on a track record, and a track record is backward-looking. A department that has been reliable for a hundred runs has earned the right to act, but it earned it on a world that was stable for those hundred runs. The day the world changes, a site redesign, a new spam filter, a policy shift, the track record is worthless, and the department acts confidently on a model that is now wrong. Verify-and-rollback catches the action that fails its own check. It does nothing for the action that succeeds at the wrong thing. The open question is how a system that earned its trust on yesterday notices that today is different, before it does something clean and damaging.

Cost against trust

Every run sits somewhere on a curve. The cheap end uses the smallest model and the fewest checks. The trustworthy end uses the strongest model and reviews everything twice. These pull against each other on every single card, and a company that optimizes only for cost will quietly erode the safeguards that make it safe, one reasonable saving at a time. The open question is who decides where each decision sits on that curve, the system or me, and how the system knows when a given action deserves the expensive, careful path. I do not want a company that is cheap and confident and wrong.

My starting stance is crude and will change. The high-volume, low-stakes work, parsing replies, extracting fields, classifying, runs on the cheapest model that is good enough. The judgment, the planning, and anything that carries my name in public runs on the strongest. Running a model of my own, on my own hardware, waits until the volume is large enough to pay for the machine and the upkeep, which it is not yet. The stance is easy to state. The hard part is the part I do not have: a rule the system applies on its own, per action, so it reaches for the expensive and careful path only where the action earns it.

Whether the thesis holds at all

The honest one. The public evidence for autonomous companies is thin. What holds up proves amplification of a capable operator, not autonomy as magic, and the rest is mostly demos. I am betting that the missing piece is orchestration and memory rather than raw capability, and that my own projects are the right place to find out cheaply. I might be wrong. This page is where I will record what the system actually teaches me, including the outcome where the answer is that it does not work, or works only with me close enough that calling it autonomous is a stretch. That result would be worth publishing too. It is the reason the testbed is a business I can afford to be wrong about.

Related

Rev. 2026-06-14