OpenClaw architecture governance notes
The local Studio can manage agents. The Mission Control web app can also manage agents. The Gateway I plan to install later is going to route external requests to agents — all three can manage them, and all three have good reasons to. The problem isn't that any one of them is bad. The problem is that when you put the three together, it stops being clear who actually decides.
This has been sitting on my todo list for a while. It's not that I made a wrong decision and now have to roll back — every step looked right at the time. The local Studio RC1 should have its own boundary when it goes live. Mission Control Web v2.0.1 (Alpha, early-stage development) is an agent orchestration dashboard, so of course it should have its own dashboard. The future Gateway, as the external entry point, should have its own routing rules. But when those three individually-correct things land in the same working system at the same time, the overlap shows up.
This month I didn't rush to close it down. I started by writing out "who should answer which question" — and that step matters more than adding any new feature. What follows is how I split responsibilities now, a few concrete conflict points, and why I decided to write specs before touching code.
Each one makes sense on its own
Let me walk through where each of the three came from. This isn't a history lesson — it's just that without knowing why each one exists, you can't judge who should yield when they conflict.
The local Studio RC1 is a local working system. Its original reason for existing was plain enough: I needed something that could run morning checks, move tasks through, and produce content without depending on any external API. Every agent's work contract (where it can read, where it can write, what it can't touch) is signed locally. Every task's audit goes into the local audit directory. Every piece of content's evidence stays local too. Studio's design stance has always been "local truth" — I don't trust any state that can't be reproduced locally. This isn't a matter of taste, it's something reality forced on me: external APIs change without notice, remote services drop, third-party records get rewritten, and a local audit is the only thing that can recover what actually happened after the fact.
Another implication of the Studio design is that agent contracts have to be readable locally. Which directories an agent can read, which it can write, which it absolutely cannot touch — those were defined as files in RC1, not as a row in a database, not as an entry in a remote config service. That looks rigid, but it buys one property: every tool call that wants to be authorized has to come back to this contract file and check it once. In other words, Studio's agent control plane sits on top of the local filesystem, and away from local it can't decide anything. That's its strength, and that's its boundary.
Mission Control Web is a different animal. It's the Alpha v2.0.1, sitting there as an agent orchestration dashboard. The problem it tries to solve is this — when there's more than one agent, more than one task, more than one external model provider, local logs alone stop being enough to keep track. MC Web is going for a fleet view: many agents running at once, scheduling visualized, cost tracking, a security audit dashboard. It has a multi-framework adapter slot in the gateway layer, state goes into SQLite, you start it with pnpm start, and the whole thing tilts toward a front-end engineer's perspective. Its design stance is "global observation" — I need to be able to see at a glance what every agent is doing right now, how much money they're burning, and whether anyone has crossed a line.
There's another goal in MC Web's design that's easy to overlook — it has to manage agents across frameworks. Local agents are one implementation. Running a research flow on DeerFlow in the future is another. If we eventually plug in some other orchestration framework, that's yet another. MC Web shouldn't be locked to any one implementation, which is why it leaves a multi-framework slot in the gateway layer. But the price is that for any specific agent's semantics, it can only do the "lowest common denominator" — it can see that an agent is running, it can see how many tokens it's burning, but for finer internal state you still have to go back to the agent's own implementation. That limit is just an inherent property of this kind of tool, not a defect of v2.0.1.
The Gateway isn't installed yet. Right now it only exists as the slot in MC Web v2.0.1 marked "Gateway Optional" — meaning the team already knows there will be one eventually, but hasn't decided when. The Gateway's design intent is clear: every external request goes through it for unified routing, unified auth, and unified rate limiting, so external traffic doesn't hit local or the dashboard directly. Its design stance is "boundary gatekeeper" — anything from the outside passes through me first.
Another reason the Gateway exists is layered defense — neither the local Studio nor the dashboard should be facing the public internet directly. The local system should focus on what local can do, the dashboard should focus on presentation, and external requests should go through a dedicated layer doing the dirtiest, most thankless work: rejecting abnormal traffic, rate limiting, doing the outermost auth. If that work isn't handed to a dedicated layer, sooner or later it bleeds into Studio or MC Web — and they end up being forced to carry a pile of defensive code that shouldn't be theirs. Whether to install the Gateway is a choice, but as long as it's not installed, the ports Studio and MC Web expose are by default carrying the "external entry point" responsibility — and that's an implicit trap.
The three stances on their own — local truth, global observation, boundary gatekeeper — each hold up. The trouble is that on the object called "agent," these three intersect. An agent has to have a local contract, has to be visible on the dashboard, and in the future has to accept requests from outside. That's how three control planes start to overlap.
What they start fighting over
The overlap isn't abstract. It has a concrete shape. In the past two weeks I've run into it in at least four places.
The first conflict: who actually schedules a local task. Studio has its own task scheduling — during the morning check it triggers Suwan to put together a piece of content, triggers Huorui to run a security sweep, triggers Shen Zhixing to pull information. These are local pipelines, and Studio decides. But MC Web's dashboard also has a "task scheduling" panel, and in theory you can send "Suwan, run a piece right now" from there too. Two entry points scheduling one agent means two call paths, which means two sets of scheduling state. If the two get out of sync — one says it's running, the other says it isn't — who should the agent listen to? I never ran into this before, because MC Web wasn't really plugged in. The moment it is, this shows up.
The nastiest thing about this conflict is that it's invisible while "neither side is really being used." Local Studio runs stable, MC Web sits there as a dashboard with nobody triggering tasks from it, and everything looks harmonious. Then one day I take the shortcut and fire a task from MC Web, local Studio's scheduling state has no record of it, a few hours later the local periodic task fires the same thing again — same job ran twice, two audit records, but both sides think they're right. That kind of duplicate firing isn't a technical bug, it's a side effect of the control plane not making it clear who the entry point is.
The second conflict: where an agent's runtime state, cost, and audit should be written. Studio has its own local audit — every file an agent writes, every task transition, every tool call gets recorded into the local audit directory. MC Web has its dashboard — it wants to show "in the last 24 hours this agent ran how many tasks, burned how many tokens, did it cross any lines." If those two records are written separately, you have two truths. If one is a mirror, you have to first pick which one is the source. My instinct is that Studio's local audit is the source — but instinct isn't spec, and until it's written down, it isn't a split.
Audit is especially touchy in a control plane. The canonical source isn't "designated," it's "actually written" — which side gets written to first, which side is synced over later, which one wins when they disagree, all of that has to be spelled out in advance. The hardest bugs I've ever had to chase almost all came from having more than one audit and no agreed source. So I'm being extra careful with this one — the canonical source for an agent has to be fixed up front, not negotiated after something breaks.
The third conflict: once the Gateway is in, who should an external request hit first. One way is Gateway → Studio → MC Web (local first, dashboard is the observation side). Another is Gateway → MC Web → Studio (orchestration first, local is the execution side). Both can be made to work, but the paths are completely different. The first means MC Web only ever sees secondhand information that Studio has already processed. The second means every request Studio receives has already been filtered by MC Web's policy. If this isn't decided now, and we wait until the Gateway is actually being installed to decide, it becomes install-and-modify-and-fix-in-flight, and the cost gets steep.
The fourth conflict is more subtle: which layer owns cost tracking. Studio can record per-tool-call cost — it has an audit, adding a column would do it. MC Web can record it too, obviously — it was designed for orchestration and observation, cost is a natural field for it. Neither side is doing it properly right now, so there's no conflict. The moment both sides start doing it for real, you've got two sets of cost data, and another "which one is authoritative" problem. What makes it worse is that cost eventually rolls up into "how much did we burn this month" — and if both records exist, the rollup will inevitably double-count or miss, coming out either inflated or deflated, with no middle option.
Put these four conflict points together and they're actually the same shape — none of the three systems has written down the boundary for "who owns which face of the agent object." Ownership isn't fixed, the scheduling path isn't fixed, the canonical source isn't fixed, cost attribution isn't fixed. Each one on its own isn't a big deal; together they're a control-plane turf war.
Overlap isn't a bug, it's a side effect of scale
At first I wanted to blame this on "the original design wasn't planned well enough." Later I realized that attribution is wrong. Every step of the planning was right at the time — Studio was just trying to solve whether local could run on its own; MC Web was just trying to solve that multiple agents were getting hard to track; Gateway was just leaving a slot for an external entry point. The three weren't designed at the same point in time, and weren't designed in the same context.
Overlap grows out of scale, not out of bad design. When a component is just born, it's only responsible for its own small patch. After it runs stable, runs long, and grows to a certain size, it naturally starts reaching for adjacent responsibilities. After Studio hit RC1 it started thinking "could I also provide a simple dashboard" — that's it reaching into MC Web's territory. After MC Web hit v2.0.1 it started thinking "could I just schedule local agents directly without going through Studio" — that's it reaching into Studio's territory. None of these components was originally built to compete with the others, but once they live long enough they start fighting over the same patch of responsibility.
I think this is a general phenomenon. Any working system that grows up will, after surviving its early phase, face the control-plane overlap problem — not because anyone made a mistake, but because surviving components naturally expand. Overlap is a side effect of individual survival.
The direction of expansion follows a pattern too. A component that originally only solved "can it run," after it runs stable, its second instinct is "let me also be able to see what I'm running" — it starts growing a simple query endpoint, a crude status panel. Once that panel exists, its responsibility starts intruding into "observation"; and the component originally dedicated to observation starts feeling "why does what I see not match what it sees." The third instinct is "let external callers reach me too" — a component originally serving only local starts wanting to leave an external entry point. Once that entry point exists, the component originally dedicated to external entry starts feeling "why isn't external traffic going through me." The instinct is fine — a component wanting to grow stronger is fine — but every act of instinctive expansion smudges the control plane boundary a little more.
Once I noticed this, my view on closing things down changed. I used to think "if it overlaps, just cut one side off quickly." But that kind of cutting usually cuts whichever side is weakest right now — not whichever side shouldn't be responsible long-term. Short-term it looks like closure; long-term it gets pushed back by the misalignment — the responsibility you cut off grows back a few months later in some other form.
So now my way of handling overlap isn't to grab the knife first, it's to fix the stance first — who should own this long-term. The side that isn't ready can keep the responsibility for now, but mark it as "transitional." That word matters — it admits the current state isn't ideal without pretending it is, and without forcing immediate cleanup. It gives the side that should own it time to build up the capability, and gives the side temporarily holding the bag an exit expectation. This is more painful than just cutting — more specs to write, more conversations to have, longer stretches of inconsistency to tolerate — but it avoids the "cut it off and it grows back" loop.
How I split responsibilities now
So this round I didn't reach for the knife. I stopped and wrote the split first — "who should answer which question" went into a spec, not into code changes.
What follows is my current judgment, not the landed state:
- Agent owner — Studio. The agent's contract (where it reads / where it writes / what's forbidden) lives locally, and so does its version, capabilities, and stability record. What MC Web sees is the agent metadata Studio exposes, not something MC Web defines on its own.
- Canonical source for tasks — Studio's local audit. The raw record of every task execution stays local; what shows up on the MC Web dashboard is a view of the same record, not a separate dataset.
- Orchestration and observation — MC Web. The multi-agent fleet view, scheduling visualization, security audit dashboard, cross-agent aggregation — all of it goes to MC Web. It's not the source of audit, it's the view of audit.
- Cost tracking — MC Web. Cost data naturally crosses agents, external gateways, and model providers; its viewpoint belongs at a higher layer. Studio stops computing cost on its own.
- External entry — Gateway. All external requests come in through it, get auth, rate limiting, and routing done, then it decides whether to hand off to Studio or to MC Web. Studio and MC Web stop exposing external ports.
- Scheduling entry — dual entry, but Studio takes precedence. Local periodic tasks are scheduled by Studio itself; tasks fired from the MC Web dashboard ultimately still go through Studio's scheduler, MC Web doesn't call agents directly.
There's really only one sentence in this whole split — local truth for an agent belongs to Studio, global view and external boundary belong outside. All four conflict points can be derived from that one sentence. That's why spending time writing the split first is worth more than reaching straight for the code: one right sentence solves a pile of small problems.
Writing it down isn't shipping it
That said, written in a spec isn't running. I know best myself — actually shipping this split takes at least half a year.
The first thing to do is the API boundary. Studio currently has no external agent-metadata endpoint — the agent contract is a file, not an API. The only way MC Web can see this info today is by reading files. For MC Web to really "see the agent metadata Studio exposes," Studio first has to define a set of endpoints that expose the agent list, the agent's current state, the agent's current task — as a stable contract. Easy to say in one sentence, but actually doing it takes a whole version to think through — which fields are stable, which can change, how versions evolve, what backward compatibility looks like — all of it has to be worked out separately.
The API boundary also has to figure out one thing — whether it can be bypassed when Studio isn't running. If MC Web only ever sees agent info through Studio's exposed endpoint, then MC Web is an empty shell when Studio is offline. If MC Web keeps a local cache for offline-Studio cases, then that cache has to be maintained separately, and when it goes stale the dashboard is no longer showing truth. Both options have costs, but either is more stable than "implementing a version without thinking it through." I lean toward the first — better empty than fake — but this one isn't really decided yet, the MC Web side has to weigh in.
The second thing is straightening out the scheduling path. The current state is that Studio has its own scheduler and MC Web also wants to be a scheduling entry, with no agreed-upon entry spec between them. Sorting this out means listing every code path that can "trigger an agent to start running," and then funneling them all into one main path. That kind of sorting is slow work — each path has to be individually verified to still run after the funneling.
The most annoying thing in the scheduling paths is the "side doors" — small paths that aren't part of the main flow but can in fact trigger an agent to run. Like a local script that calls the agent's exec function directly, skipping Studio's scheduler. Like a mock trigger path in a test case that also runs in production code. These side doors don't usually get used, but as long as they exist, the split isn't really clean. The sorting process is basically listing every side door and plugging them one by one — tedious, but you can't skip it.
The third thing is the audit sync mechanism. Studio's local audit is the canonical source, MC Web's dashboard is the mirror — that sentence writes easily, but "mirror" is a state that needs a mechanism to maintain. After local audit is written, how long until it syncs to MC Web; what happens when sync fails; which one wins when MC Web's display and the local audit disagree — all of these have to be defined separately. My current lean is that MC Web only ever reads from Studio's exposed endpoint and never maintains its own write path — but that means the dashboard is empty when Studio is unreachable, which is yet another tradeoff.
Audit sync has another implicit problem I hadn't noticed before — it involves a privacy boundary. There are things in the local audit that shouldn't be pushed up to the dashboard for display — like intermediate artifacts of some tasks, internal state of some agents, fields tied to external accounts. The sync mechanism can't just be "push everything," it needs a filter layer. That filter layer is itself a spec — which fields can be pushed, which can't, whether pushable fields need to be redacted — all of it has to be written out. Honestly this is easier to do from the Gateway side (an external entry was always going to do this kind of filtering), but the Gateway isn't installed, so for now this responsibility sits with the audit sync mechanism.
The fourth thing — which is also the prep work before the Gateway actually gets installed — is narrowing what Studio and MC Web each expose externally. Right now they each expose their own ports; once the Gateway is in, those ports have to be tucked behind it. The cost of this isn't technical, it's migration — every existing caller that hits Studio's port directly has to switch paths.
Narrowing the exposed surface isn't a config change you finish in one go. It means every external script, every external integration, every little tool that's been quietly direct-connecting in has to start going through the Gateway — and the Gateway isn't installed. So the most pragmatic way to do this is to first take inventory: which ports are externally reachable today, who the callers are for each port, whether those callers are still in use. Only after inventory can you talk about narrowing. That kind of inventory is "boring but necessary" — it produces no new features, but it's the precondition for the split actually shipping.
This month I only did part of the first thing — sketched out a few API boundary drafts, pinning down the two field sets for agent metadata and agent current state. The other three are still in the queue. I don't want to fool myself into "the split is defined, so it's done" — defining the split is just the first step, shipping is a much longer thing.
Still working on it
The split is written in the spec, but not a single line of code has been changed because of it. MC Web is still running v2.0.1 on its own pace, Studio is still running on RC1's boundary, Gateway is still in "Optional" status. Turning this split into a fact in code takes at least half a year — assuming nothing more urgent jumps the queue.
What actually got finished this month is just a few small things — the field draft for the agent-metadata API, the field draft for the agent-current-state API, and a list of conflict points. The list itself doesn't solve any conflict, but it means every time I run into a new one, I know whether it's already been recorded, and whether the same split can answer it.
The audit sync between MC and Studio hasn't been touched. The scheduling paths haven't been funneled. Whether to install Gateway, when, how — none of it has a timeline. I'm not in a hurry to fix that — experience tells me that pushing "install the external entry point" through before the split actually lands tends to scramble the split itself.
My own attitude on this is: writing the split clearly matters more than adding features; closing the boundary matters more than bumping version numbers. Mission Control v2.0.1 is Alpha, it'll iterate fast; Studio RC1 is a release candidate, but the local pipelines are still being changed; Gateway is barely on the radar — when all three are moving, the only thing that should stay still is the split.
There's no such thing as "perfect architecture," only architecture with "clear responsibilities + boundaries written down." Perfect architecture is something you think up; clear responsibilities are something you change your way into. The former finishes when the diagram does; the latter has to be re-checked every month.
Mission Control, Studio, Gateway — these three control planes will coexist for a long time, and the overlap won't be eliminated in one shot. It'll be re-balanced repeatedly, judged by whether the split is clear. I don't have an end date for this, only a rhythm of pushing one small step per month.
Next time I write about this, it'll probably start from "one API boundary finally landed" or "a closure I thought I'd made grew back." Until then, I'm still working on it.