Learning Notes
In my last post I wrote about how Linlu spent a month and still hadn't shipped a single video. Today I read a lesson called "What Prompt Actually Is," and looking back at that month — at least 5 of the failures weren't because the AI wasn't good enough. They happened because I was treating prompts as "the right phrase to coax the AI" instead of "a task contract between me and the AI." This lesson made me reattribute everything.
What this lesson did for me most: reattribution
I don't write code. The only way I interact with AI every day is one thing — I write prompts to it. Codex writes code for me, Claude Code talks with me, Linlu makes videos, Ji Yanran does the voice work, Su Wan writes content — there's no fallback path for me like "just read the source" or "tune the API." It's all stitched together with one Chinese-language prompt after another.
The biggest value of this lesson wasn't teaching me how to write prettier prompts. It was forcing me to straighten out the wrong attribution I'd been making for months. Too many times I'd said things like — "Linlu just isn't good enough," "Why did Codex go off the rails again," "This model is unstable." This lesson said: it's not that the AI isn't good enough. It's that you didn't define the task clearly. That sounds like criticism, but for me it's actually good news — I can't control the AI half, but I can control the task-definition half.
The line from the lesson stuck with me: "Prompt isn't a magic phrase. It's a task contract." Task contract — four words. I used to think a prompt was a phrasing trick, a clever technique, the right way to coax the AI into being smart. Now I know it isn't. It's a temporary working spec — this is what I'm asking you to do, here's the role I want you to play, why we're doing it, how far to take it, what counts as acceptable, and what you must not do.
The 4 mistakes I kept making
The lesson breaks prompts into 6 parts (Role / Goal / Context / Task / Constraints / Output Format). I checked them against every prompt I'd written in the past month — there were 4 mistakes I kept repeating.
Mistake one: treating "what to do" as a complete instruction. The Linlu case is the clearest. The instruction I gave was "make whatever, like Lin Daiyu entering Jia Mansion." That sentence had a complete picture in my head — what kind of young woman Lin Daiyu should be, what kind of grand household Jia Mansion should look like, what pacing the shots should have — but none of it made it into the prompt. The S03 output came back as three nearly identical orange stick-figure frames, because the control video was procedurally generated and Linlu had no way to know I didn't want that. I was furious afterwards — but actually she wasn't wrong. I was the one who didn't say "no procedural stick figures," "the motion has to come from live-action reference footage." I thought "make Lin Daiyu entering Jia Mansion" was a goal. It was just a title.
Mistake two: never writing a [Role]. I used to think specifying a role was decoration — saying "you are an AI engineering coach" sounded cringey. Later I realized it isn't decoration at all. It's selecting which brain to use. The same question, asked of a "product manager brain" versus a "security engineer brain," gives completely different answers. I once had Codex evaluate a video generation pipeline without specifying a role, and it gave me a report written from the perspective of a research scientist — sampling strategies, loss functions, new papers. But what I actually wanted was an engineer's view — can I install this, will it run on a Mac, how many minutes per run, what do I do when it errors. One sentence — "you are a Mac local-deployment engineer" — would have flipped the answer in the direction I needed. I'd never been using that lever.
Mistake three: only saying "what to do," never "what NOT to do." This is the other side of the stick-figure incident. I used to think the simpler the prompt the better — "do X for me," and let the AI figure out the rest. But "figure out the rest" ended up meaning "use the defaults," and the defaults often hit landmines I had in my head. If I'd added one line back then — "don't use procedurally generated stick figures as motion source, don't use any keyframes below 768×1344, don't fake motion inside a static composition" — several of those detours that month could have been avoided. People rarely write down what "not to do." But often, what NOT to do shapes the result more than what to do.
Mistake four: never specifying an [Output Format], then reformatting by hand afterwards. When I used to ask Codex to show me something, it would come back as several big paragraphs of prose. Then I'd spend 10 minutes mining the key info out of it and reorganizing into a table or a list. I did this dozens of times — until this lesson told me the output format is part of the prompt. I could have said up front, "output as a table, three columns: item / current state / suggestion," and saved all that downstream cleanup. There's an engineering term for this — "reducing the cost of secondary processing." Plainly: whatever shape you want it in, draw it for the AI directly. Don't make it guess.
Same task — how I used to write it vs how I'd write it now
Real example. A month ago I asked Codex to evaluate a TTS library (OmniVoice). My prompt back then was roughly:
Take a look at this TTS library OmniVoice and tell me if we can use it.
That sentence says nothing. I didn't say what machine I was on, didn't say what kind of voice I wanted, didn't say what "can we use it" was measured by, didn't say what form of answer I wanted back. Codex ran a whole analysis — discussed the architecture, listed model sizes, estimated token cost. And then? I couldn't take the next step. I had no idea whether the voice this library produced sounded anything like the DJ in my head, because the model was still stuck at 12% downloading.
Now I'd write it like this:
Role: You are a Mac local AI engineer who has deployed TTS libraries like fish-speech / OmniVoice / Coqui on Mac.
Goal: Help me decide whether OmniVoice can produce, on my Mac Studio, the kind of voice that sounds like "a late-night radio host with a warm, breathy tone."
Context: I'm building a personal AI radio station. The DJ needs to read morning/evening greetings and song intros to me every day. I've already tried macOS built-in say, Nanami reading Chinese, and Ava reading Chinese — none of them work. The reference voice is Xiaoxiao from edge-tts.
Task: (1) List the deployment steps for OmniVoice on Mac MPS and the known pitfalls; (2) Evaluate whether it can realistically clone the Xiaoxiao reference; (3) Give me a minimum listening-test plan that I can verify within 30 minutes.
Constraints: Don't pile on model architecture details. Don't assume I understand PyTorch internals. Don't give me "in theory this could work" answers — give me concrete steps.
Output Format: Three sections matching (1) (2) (3). Each section opens with a one-line verdict (YES / NO / NEEDS TESTING), then the details.
That's not "asking the AI" anymore. That's dispatching a person with a specific identity to execute a task with specific boundaries. The first version takes a week and still has no result; the second version tells me in an hour whether this path works.
The title of the lesson hit something in me — "Prompt isn't a magic phrase, it's a task contract." Almost every prompt I'd written before was written at the "magic phrase" level. I'd send it to an agent, the agent would go off course, and I'd blame the agent. It took this lesson for me to see that I'd been writing the "task contract" as a wish.
For my next post I'm going to rewrite Linlu's prompts using this framework — going through every failure case from that month and seeing how many of them could have been rescued upstream with a stricter task contract. If they can be rescued, that's the direct payoff from this lesson. If they can't, then the AI side really does have a hard limit there — but at least the attribution is clean.
That's the biggest thing this lesson did for me: it reattributed "the AI isn't good enough" back to "I didn't define it clearly" — and it puts something concrete in my hands to work on. I'm still revising.