What Was I Doing for the Past 35 Hours?

YunLab · Engineering Retrospective

At six-thirty this morning I had Cici dig through the past 35 hours of my Claude Code session logs, git commits, and token bill. The reason was simple: the pace of the last two days felt off — work was clearly moving, but I couldn't say how much. Rather than brag or worry on gut feeling, I decided to do the math.

Once the numbers were in, I decided to write them down. Partly for my own records, and partly because this bill happens to answer a question I keep getting asked: what is Fable 5 (the current flagship model behind Claude Code) actually good at? Marketing is everywhere; field bills are rare. This is a field bill.

The numbers first

The window: June 10, 7:39 PM to June 12, 6:39 AM — 35 hours flat. Methodology up front: this counts every Claude Code session log on my machine, minus 114 micro-sessions generated automatically by the desktop app (it periodically uses a small model to check "does the assistant still have work to do right now" — that's system self-checking, not human work).

85 working sessions; I personally typed about 400 instructions;
the model replied with roughly 10,000 messages and took action 5,200+ times: 2,400+ shell commands, nearly 1,000 file reads, 595 file edits, 219 new files, 440+ web searches and fetches;
9.9 million tokens of output (a token is the unit models use to measure text — 9.9 million is on the order of several million words), with 2.38 billion tokens of context throughput;
52 git commits across 5 repositories;
21 of the 35 hours had activity on the machine — including the hours I was asleep;
nearly 90% of replies came from Fable 5; the rest were subtasks and system self-checks running on other models.

Bar chart of model replies per hour across 35 hours: six peaks matching six work streams, with a yellow band on the second night — I was asleep, it was working

400 instructions traded for 5,200+ operations and 52 commits — on average, I say one thing and it does thirteen. That ratio is the number most worth writing down. When I first used AI to write code, the ratio was roughly one to one: I say something, it edits a block, I say something again. Now it catches one sentence and runs the rest itself.

Seven things in 35 hours

Working backwards from git commits and session records, seven work streams were moving in parallel.

A quota widget: from one sentence to a usable app. On the night of the 10th I said "I want to turn my task board and AI quota panel into a standalone Mac app." By the afternoon of the 11th: 14 commits — a native desktop-pinned widget, a menu bar tray, zero-config monitoring of Claude Code sessions (which one is running, which one is waiting on me, which one ended — a state machine decides). Along the way it ran a code review: a long list of suspects, each verified, 7 confirmed real bugs, fixed.
This very website. The yunlab.ai you're reading was wrapped up in these 35 hours: a new skin, an editorial pass over all 28 Chinese articles, plus the launch of "Ask YunLab" (AI Q&A) and the guestbook (database storage + AI moderation: good-faith criticism passes, malice and ads get rejected). 15 commits. One trap along the way: a browser security policy (CSP — Content Security Policy, which forbids inline scripts). The Q&A worked fine locally and went silent in production; root cause was the build tool helpfully inlining scripts into the page. One config line to disable inlining — solved.
Three waves on Claudio, my radio. The AI net radio I built for myself. In these 35 hours: async announcements, cutting "press play to first sound" from 113 seconds down to 2; taste feedback, where my loves and skips change what gets picked next; and a cheaper background brain for off-peak work. Plus a UI redo, playback straight from the server's own speakers, and a fix for a silent network-drive dropout that night. 12 commits, and one code review that fixed 25 issues in a single pass.
A global logistics intelligence center, zero to live in a day. First commit at 4:42 PM on the 11th; self-hosted RSS service live at 9:35 PM: data foundation, API, event scoring, four dashboards (map tiles fully local and offline), 44 intelligence sources, plus a policy and geopolitics layer. At 6:35 this morning it also fixed a stale process squatting on a port. 9 commits.
The governance layer. Machine constitution v2 (the rules governing what AI may and may not do on my machine) landed as a four-tier system, the user profile system was rebuilt as v2, and all governance files went into git. This work ships no features — it sets the safety boundary for everything else.
OpenClaw agent maintenance. Pipeline tuning for Shen Zhixing (the intelligence-gathering agent), smoothing his handoff to Su Wan (the writing agent), cleaning out historical scraped data, and tidying up leftover config from Ji Yanran's voice-bridge experiment.
The video line worked the night shift. Lin Lu's video factory has the 45-second Daiyu piece on the line. As I write this, six shot segments have just finished rendering — that little cluster of activity in the small hours of June 12 on the chart is the video line. I was asleep.

Seven streams are not seven miracles; there's plenty of mundane patching in there. But they ran in parallel — that's the biggest difference from before. I used to be single-threaded: open one front, guard one front. Now I'm more like someone watching several pots, checking whichever one whistles.

What actually makes Fable 5 strong

Now, the model. Nearly 90% of replies in these 35 hours came from Fable 5. The four points below aren't benchmark scores — they're judgments that grew out of my own bill.

One: it holds together over long distances. The longest session stayed alive from one end of the window to the other — 34 and a half hours on and off, picking up exactly where it left off each time, never starting over. The entire logistics intelligence center began as one sentence from me — "I want to build a global logistics intelligence center" — and it broke that sentence into six phases on its own, from empty directory to 44 sources live. "Forgetting what it was doing" mid-task used to be the norm with long jobs. In these 35 hours I never ran into it.

Two: hands-on density. Of the 5,200+ tool operations, shell commands were nearly half. It wasn't keeping me company talking architecture — it was operating this machine: installing services, configuring background jobs, running builds, reading logs, starting and killing processes. A one-to-thirteen instruction-to-operation ratio means that most of the time, it was working and I was doing something else.

Three: it fixes problems at the root. Three examples, all real, all from these 35 hours. The radio's background brains all silently degraded; instead of poking at the calling code and hoping, it read the logs and found the root cause — the background service's environment was missing a path entry, so a dependency tool failed on startup and exited immediately. Fix the environment, not the code. The website Q&A went mute in production; root cause was the build tool inlining scripts and tripping the security policy — not an API problem. This morning the logistics dashboards went dark; root cause was a stale process squatting on the port, and after fixing it, it also rewrote the start/stop scripts to detect by port, so this whole class of problem gets caught next time. Anyone can patch. Finding why it broke is what saves your life later.

Four: it dispatches its own crew. In 35 hours it launched 13 multi-agent workflows, spinning up two hundred-plus parallel subtasks that reported back structured conclusions. The radio code review worked exactly that way: parallel scans across different dimensions, findings merged and verified one by one, 25 confirmed issues fixed. Same playbook for the quota widget review — after verification, the real bugs numbered 7. It did not "helpfully" fix things that merely looked like bugs — and that restraint reassured me more than the fixes themselves.

Where it is not strong

This is where the piece risks turning into an ad, so this section is mandatory.

It can't police its own voice. The first draft of the guestbook copy reeked of standard AI-speak; I bounced it and it went live only after a rewrite. It writes fast and smooth, but "does this sound like me" is a judgment it cannot make. That's still my job.
Its memory is external. 2.38 billion tokens of context throughput, said another way: the model itself remembers nothing — every session works by re-feeding it the history. Without my system of memory files, task folders, and handoff documents, these 35 hours would have been 85 conversations with mutual amnesia. My file system remembers for it; it does not remember on its own.
It doesn't make the calls. Which model vendor sits behind the Q&A, how strict guestbook moderation should be, whether a video shot is usable — every directional decision in these 35 hours was made by a human. 400 instructions, one every five minutes on average. This is not "fully automatic"; this is high leverage. Leverage amplifies output — and amplifies bad decisions too. Which makes the person pulling the lever more important, not less.

35 hours. 52 commits. Four usable things moved a big step forward, and one video production line worked the night shift while I slept. One middle-aged man, one model.

I don't think I got stronger. The leverage changed — the same sentence that used to buy me a block of code now buys a working system. And the bill is equally clear about the price: leverage eats tokens, needs a human guarding the voice, needs external memory, needs someone to make the calls.

I'll probably run this bill again every once in a while. Where the curve goes — that's for a future entry.

Update (June 13): Less than a day after this went up, Fable 5 was pulled by a US government export-control directive — three days from launch to shutdown. I wrote the follow-up here: "The Model I Praised Yesterday Is Gone Today."