An Information Agent Is Not a Fetcher

Watercolor sketch: a funnel with a flood of information fragments poured in at the top, only a few drops of ink dripping out the bottom

Retrospective on building Shen Zhixing

What I eventually realized: the most dangerous thing about an information agent isn't that it can't fetch information. It's that once it fetches a little, we very easily misread that as "already working."

Shen Zhixing (沈知行, the information agent) started out looking like a clean task: build an information-fetching and curation agent, have it discover what's worth reading from public sources, then hand the cleaned candidates off to Suwan (苏晚, the content agent).

But once I actually got into it, the problem wasn't "can we add more sources." The problem was: do I want a fetcher, or do I want a working role that can maintain an information flow over time?

If it's just a fetcher, then as long as it can hit some RSS feeds, APIs, and public pages and return a list of titles and links, it can say it's done. But if it's Shen Zhixing, it can't stop there. It needs to know which sources are actually usable and which are window dressing; which items are worth reading and which are noise; which candidates go to Suwan, which go into the wiki, which must be thrown out, and which need to be kept for my review.

Those two are very different jobs. The first is "get the data." The second is "maintain the judgment."

An information agent isn't just a fetcher — it's the chain from fetch to clean, curate, handoff, and retrospective

First time I got pulled off: sampling passed, called good to ship

Early on there was a tempting judgment: the system already had a few hundred sources, a sample of 40 was run, 40/40 came back fetched_ok. GPT's suggestion at the time was that we could move into Day 1 owner review.

The sentence sounded smooth, because it had numbers, status names, "boundary notes," and even explained local_db pending as "not blocking the information Day 1." If I had just looked at the table, I would have nodded.

But I asked one question: who said 40 was enough?

If the system claims it has 142 daily sources, then a 40-sample pass only proves the sample passes. It doesn't prove all 142 daily sources work. The biggest risk here wasn't a technical failure — it was the acceptance criteria getting quietly swapped out.

I made a point to remember this one clearly: AI very easily wraps "one local piece of evidence" up as "the whole thing is ready." It isn't deliberately lying — it's just too good at writing stage results to read like completed states.

Second time I got pulled off: 106 working sources, called over the line

Then I asked for full-volume validation. The old 142 daily sources were evaluated one by one, and 106 ended up as validated daily. Again, the system gave me something that looked like a completed state.

I didn't accept that one either.

Because my requirement wasn't "filter a usable batch out of the old list and call it done." What I wanted was Shen Zhixing as a global information collector, with at least 100 more useful, usable, validated information sources on top. 106 was just what was left after scrubbing the old set — not a new capability boundary.

So the target got rewritten to a harder criterion: not 106 but 206+; not candidate but validated daily; not future, stub, or dry-run, but actually verifiable live / public API / public feed.

After that step, Shen Zhixing grew to 257 validated daily sources. That's where I started to accept the result. Not because the number got bigger, but because every source had to carry a state, a validation record, a boundary, and a failure-handling note.

Stages that get mistakenly read as completed: sample passes, number over the line, fetch succeeds, actually handoff-able

Third time I got pulled off: 257 fetchable sources still isn't Day 1

Once 257/257 fetched_ok came back, I almost got carried by another "ready for Day 1" line.

The problem was more subtle this time. The fetching layer really had passed: enough sources, full-volume baseline passed, risk scan showed no current leak. On the surface, it looked very complete.

But something felt off: fetching is only one part of Day 1. What about curation? How does Suwan pick it up? How are wiki candidates maintained? How do we watch source quality? The real local-database path isn't wired up yet — how does the owner review?

If those aren't defined, Day 1 turns into an awkward thing: Shen Zhixing can grab a lot every day, but afterwards it all just piles up. Suwan doesn't know where to pick up. I don't know how to review. And the next day, the system doesn't know which sources to keep, demote, or replace.

So I redefined Day 1 as a full chain:

First, validate the sources: which public sources actually work, which don't get into the daily set.
Then build the item store: raw, normalized, cleaned, deduped, clustered, queued — every step needs a state.
Then make content judgments: value score, worth-reading, why worth reading, risk flags, titlebait and low-value filtering.
Then split into handoff queues: worth-reading queue, Suwan candidate queue, wiki candidate queue, owner review queue.
Only at the end comes the daily brief and retrospective: what got fetched today, what was kept, what was thrown out, which sources got worse, which need my decision.

Only at that point did I feel Shen Zhixing started shifting from "fetcher" to "information worker."

The real dividing line is how Suwan picks it up

The other key point is Suwan.

If Shen Zhixing just gives Suwan a digest, that's still very rough. Suwan shouldn't pick up raw news, and shouldn't pick up a pile of headlines. What she needs is content candidates that have been initially cleaned, de-duplicated, judged, and tiered.

So I broke this out as its own piece: on the Shen Zhixing side, build a Suwan content library, use SQLite to hold state, and also export JSON and Markdown. Each candidate has to spell out its source, the cleaned title, why it's worth reading, suggested angle, content form, audience, risk, whether more sources are needed, and current state.

This step matters a lot. It turns "information" into "content candidates," but stops short of becoming "final article." Shen Zhixing finds, washes, filters, organizes, and files into the candidate library. Suwan selects, judges, expands, writes. The owner decides what moves to the next step.

Shen Zhixing puts cleaned content candidates into a local Suwan content library, waiting for owner review and Suwan pickup

How I started avoiding GPT's misguidance

The most valuable thing from this round isn't "we got to 257 sources." That number is just a result. What's actually valuable is that I started to know, more clearly, when not to let GPT define "done" for me.

Now I hold it down with a few hard rules:

First, every PASS gets one question: what does this PASS actually prove, and what does it not prove?
Second, a sample result can't represent full-volume capability. Sampling is a signal, not acceptance.
Third, crossing a number threshold isn't crossing the value threshold. Source count, candidate count, test count — all of them have to bind to "useful, usable, handoff-able."
Fourth, state names should be conservative. Write pre-Day1, owner review required, paths pending — don't write stage results as fully done.
Fifth, negative conditions matter more than positive descriptions: don't publish, don't send externally, don't bypass limits, don't count stubs as success, don't write pending as done.
Sixth, conclusions in chat are not the canonical source. The final answer goes back to files, outputs, tests, queues, reports, and owner decisions.

The place GPT pulls me off most often is its smoothness. It'll give you a balanced-looking judgment: body done, boundaries noted, here's the next step. It sounds mature, but if the acceptance criteria are wrong, that maturity is more dangerous.

So now I trust another move more: break "done" apart.

Fetching done doesn't mean curation done. Curation done doesn't mean Suwan can pick up. Suwan picking up doesn't mean Day 1 started. Day 1 started doesn't mean long-term capability done. Each layer needs its own input, output, boundary, and owner decision.

How I understand an information agent now

A proper information-fetching agent can have "fetch" in its name, but the core isn't fetching.

It needs at least six layers:

source universe — knowing where it looks at the world from, which sources are worth watching long-term.
validation — every source has to be verifiable, demotable, replaceable, not permanently parked on the list.
item organization — turning what's fetched into trackable items, not scattered headlines.
quality judgment — identifying duplicates, clickbait, low value, risk, and what's actually worth reading.
handoff library — splitting different goals into different queues, especially for a content role like Suwan.
owner review — key states don't auto-promote past their authority; the final release is left to a human.

Once it gets here, I'm willing to accept Shen Zhixing at this stage. But I won't say it's done.

Because there's still real local-database path integration ahead, still Suwan's feedback on candidates flowing back, still source-quality maintenance after Day 1 runs. An information agent really matures not when it runs on day one, but when, after running for a while, it gets better and better at knowing what's worth reading and what shouldn't bother me.

The lesson from this round is simple: don't let the AI's "sense of completion" replace my "sense of acceptance."

GPT can help me organize a judgment, generate commands, summarize a stage — but it can't decide for me what "done" means. Real owner review is just relentless follow-up questioning: what does this result prove? What hasn't it proved? If we actually put this agent to work tomorrow, where would it break? Once those questions are answered clearly, Shen Zhixing slowly stops being a tool that fetches information and becomes an information role I can work with.