Three Attempts at an AI Music Player: It Wasn't a Technical Problem, It Was an Unclear Goal

Watercolor: a desk seen from above, a glowing radio UI on screen, three sticky notes reading yuns-radio / omnivoice / claudio

AI Player · Postmortem

Over 30-plus days I built an AI personal radio station three times. The first two attempts died; the third survived only on a 5-day sprint and 13 independent audits. Looking back, the root cause was never technical — every time I started, I thought the goal was clear enough, and halfway through I found out I had never thought it through.

Why it took three attempts

The goal never changed: turn a phrase like "Monday morning, something quiet" into a 30-minute show that feels like real radio. A DJ who talks, transitions between songs, and never asks me to pick tracks. I built the same thing three times.

Where it went wrong

Laying the three attempts side by side, the same four pits keep showing up.

First: I started installing TTS libraries before I could even hum what the DJ should sound like. The second project, omnivoice, had exactly one goal — install an open-source TTS (text-to-speech) library and see if it could be the station's voice. Install the package, get MPS (the GPU interface on Apple silicon) working, get the CLI running — and then it stalled on HuggingFace (the largest open-model hub): a 6GB model frozen at 12%. On the surface the network killed it. But even if the model had downloaded and a voice had come out, I had no way to judge "is this what I want" — before installing anything, I had never once hummed to myself what my DJ should sound like. The crisp diction of a state-TV announcer? The husky warmth of a 2 a.m. radio host? A girl next door just chatting? I had never thought about it. The moment the download froze merely handed me an excuse: I couldn't have answered the next question anyway.

Second: I never separated "player" from "radio station." NetEase Music and Spotify center on the user choosing songs — search, like, playlist, skip. Radio is the opposite: it comes on at its hour, the DJ talks and segues on her own, and you don't get to pick. The first project, yuns-radio, gradually grew "previous," "next," and "like" buttons in its UI — and only then did I realize it had turned back into a music app. By that point the entire codebase was built around user-driven logic, and undoing it meant breaking bones. When cc_claudio rebuilt everything the third time, rule number one was: the user doesn't pick songs, the user gives an intent — like "Monday morning, something quiet" — and everything else belongs to the brain and the dispatcher.

Third: "runs on my machine" is not "runs on its own." My own shell goes through a proxy by default (the HTTPS_PROXY environment variable), so calling the claude CLI and pulling HuggingFace models always felt smooth. The day omnivoice froze at 12% on a 6GB model, I simply wasn't behind the proxy — a mainland-China IP hitting HuggingFace directly is unwatchably slow. cc_claudio later hit a harsher version: the claude CLI returns a flat 403 on a mainland IP. Both failures are the same species: everything works during development, then dies the moment it runs unattended. A child process started by a LaunchAgent (macOS's boot-time background service) cannot see the proxy settings in my shell — it lives in a clean environment. And at kickoff I had never once asked, "what will its environment look like when it runs alone?"

Fourth: I stopped at "can demo it once" and never asked "would I actually use this?" I took yuns-radio exactly as far as a page I could click through in a browser. Press a button, hear the DJ speak (placeholder TEMP_FALLBACK text, but still), hear a song (only 30 sample tracks, but still), pause, skip. Run through it once in front of me and it looked pretty convincing. But did I actually use it for news over breakfast every morning? No. Did I ask anyone else to try it? Also no. The moment I "demoed it to myself," something in the back of my mind said "good enough for now" — and I never went back. omnivoice got as far as a working --help on the command line and I felt "the foundation is there" — but the next step never had an answer, so that "foundation" was an illusion too.

Four pits, one root: starting before the goal is thought through

Spread the four out and not one of them is "the code was wrong." The voice problem wasn't a bad TTS choice — I never knew what voice I wanted. Player-versus-radio wasn't a UI mistake — I never decided which product I was building. "Runs on my machine" wasn't a proxy misconfiguration — I never considered what environment the thing would live in on its own. "Can demo it" wasn't a lack of testing — I never defined what "done" looked like.

I used to think the goal was clear at kickoff — "build an AI personal radio station" sounds specific enough. It says nothing. What does the voice sound like? Is the product a tool or a show? What environment does it run in unattended? What counts as finished? I started without answering a single one of those four questions, and every two days one of them came back to knock me over.

The root cause of three failures wasn't technical depth or experience. Every time, I skipped the step where you decompose the goal until it can actually be answered.

How I want to run projects like this from now on

I now keep a kickoff checklist: four things that must have answers before the first line of code.

What does the core deliverable look like? Not an abstract description — something you can hum, sketch, or act out. For an AI player the core deliverable is the DJ's voice; thirty seconds you can hum is enough. Skip this and every engine choice is a blind one.

Write down what you will NOT build. Whatever the "to build" list leaves unsaid gets silently filled back in by defaults. yuns-radio never wrote down the boundary "users don't pick songs," so the UI quietly sprouted a pile of song-picking buttons — not because I decided to build them, but because defaults grow.

What is its environment when it runs alone? Inside a LaunchAgent, on someone else's machine, in the hours when I'm not watching — what can it see and what can't it? Is the proxy there? Do the environment variables carry over? Are the dependencies installed? List it before writing code. Both of my mainland-network failures happened because this line was missing.

What exactly do I mean by "done"? Demoable? Survives without me watching? Seven days with no incident? Think it through and write it down. The night cc_claudio's LaunchAgent went in, I didn't touch it — I let it wake itself at 6 a.m. and send me its first DJ morning briefing. Seeing that message pop up on my phone's lock screen the next morning — that was this round's "done." A completely different standard from "can click through it in a browser."

cc_claudio now runs, gets used, and survives reboots — but it is nowhere near "done." The 9T index stalled at stage 13B after a machine swap, so new music can't come in. The right reference clip for true fish-speech voice cloning still hasn't been found; five candidates are sitting on my desktop. Settings, the mini player, and Lock Screen Now Playing don't exist yet. And the 5-day sprint cadence is not sustainable — the next step is to let it run a few mornings and late nights without me touching it, and let real use grow the next batch of problems.

What three attempts and 30-plus days bought me isn't an .app — it's that kickoff checklist above. It will almost certainly keep growing: next time I fall into a pit it doesn't cover, I add a line.