Canonical Source Rules

Watercolor sketch: a row of documents on a desk, one circled in pencil, with a hand-drawn arrow labeled 'canonical source'

Notes on picking the canonical source

What I eventually figured out: in an AI project, the hardest thing to deal with isn't "no truth." It's that many things look like truth.

There's a conclusion in the chat, a conclusion in the report, a conclusion in the task folder, a conclusion in memory. There's a freshly built page locally, and a reachable page in production. The old audit says it's not enough; the new report says the gap is closed. Each piece of material, taken alone, can tell a coherent story.

That's exactly where it gets dangerous. If a piece of material is clearly wrong, it's actually easy to handle. The real trouble comes when it was right at some point, inside some boundary — and now it's being used to answer a different question.

So I set myself a rule: decide who has standing to answer first, then look at what they said.

Where chat, reports, state files, and real outputs sit in the canonical-source judgment

The conflict: AI passes "stage-correct" off as "currently-correct"

The classic example is all the pretty status words: 95 points, launch ready, local candidate, PASS_WITH_WARNINGS. Each one is not necessarily wrong on its own — some are even carefully worded.

But the next time an AI picks them up in isolation, the meaning shifts. It may remember "launch ready" and forget "local-only"; remember "95 points" and forget that a later independent audit knocked it back; remember "the page opens" and forget that was just a local preview; remember "generated" and forget it was never deployed to production.

This kind of error doesn't blow up immediately. It's more like slow contamination: the AI keeps writing from a wrong root, treats old state as current state, calls local-only ready for production, calls a sampling pass a full completion. And it writes it out fully — even with a plausible-looking next step.

I'm not afraid of an AI not knowing. I'm afraid of it, while not knowing, taking a piece of material that has no business answering, and producing a very smooth answer.

The wrong fix: chase the newest, the longest, the best summarizer

I used to take shortcuts too. Whichever file was newest, I'd read first; whichever report was longest, I'd assume was more complete; whichever AI said "done" last, I'd quietly treat as current state.

But in AI projects, "newest" often just means most recently written — not most recently accountable. "Longest" often just means most explained — not strongest evidence. "Smoothest summary" is even more dangerous, because it wraps the conflict up into a pleasant story.

My understanding of canonical changed after that. Canonical isn't "the one truth file," and it isn't "you only get to look in one place." It's more like jurisdiction in court: different questions go to different responsible sources.

Different questions go back to different canonical sources, instead of using one chat summary to answer everything

My rule: ask the question type first, then pick the canonical source

Now I break the question apart first. Instead of opening with "how is the project doing," I ask: what am I actually judging?

Ask about content — look at content files. What an article looks like is settled by the Markdown in the repo, not by a summary in chat.
Ask about current progress — look at task files. README, notes, handoff, state files all outrank a chat summary.
Ask whether something runs — look at real output. Builds, tests, script results, page response codes beat "it should work."
Ask whether something is published — look at the production route. If the question is whether outsiders can see it, a local preview doesn't count.
Ask why a decision was made — look at the decision record. Without an owner sign-off, an audit conclusion, or an explicit superseded marker, you can't quietly promote an old judgment into the current one.
Ask for historical clues — only then go to chat and memory. They're there to help you locate, not to render the verdict.

Once this order is fixed, a lot of arguments disappear. Not because there's less material, but because every piece of material now has a boundary.

When sources clash: don't summarize, adjudicate

The most error-prone moment is when sources fight each other: chat says A, the file says B; the old report said it passed, the new audit says it didn't; the local page is the new version, production is still the old one.

These days, I don't let the AI summarize right away. Summarizing too early just wraps the conflict more nicely. I make it do four adjudication steps first:

Step one — name the question. Is the current question about content, state, runtime, deployment, acceptance, or historical reason?
Step two — list candidate sources. Which materials claim they can answer this? At what time and inside what boundary were each of them produced?
Step three — pick the responsible source. Who has direct responsibility for the current question, and who's only a clue or old evidence?
Step four — mark the superseded ones. If an old conclusion has been overturned by a new audit, an owner decision, or actual runtime results, write "superseded" explicitly — don't leave it for the next AI to guess.

This method looks slow, but it's much cheaper than pushing forward on the wrong canonical source. Once the path is wrong, every step after that manufactures rework.

The end result: the project gets quieter

What the canonical rule produces isn't every piece of material consolidated into one file. It's the project becoming quieter.

I used to ask "which version are you talking about?" all the time. Now I ask first: "where's the canonical source for this question?" If I find it, I continue from there. If I can't, I write a current-state file first, instead of continuing to guess in chat.

It also changes how I read AI output. Smooth writing doesn't mean accurate. A complete report format doesn't mean it's a verdict. A line that says "done," without a corresponding file, output, verification, and boundary, is just a claim waiting to be checked.

The canonical-source rule is, at its core, about assigning responsibility to materials.

Chat moves things forward; memory indexes; reports raise claims; task files hold the process; real outputs do the acceptance; owner decisions release. Every source has a place — only then does the project stop getting dragged around by a pile of materials that all look correct. In the end, what I want isn't an AI that summarizes better. It's an AI that knows where to go back to and check the fact, first.