We Built an AI Skill for Community Radio Over a Weekend
Here's What We Learned
[REDACTED] is a community radio station run entirely by unpaid volunteers. Every week, the music department head downloads a CSV of what got played on air from Spinitron, opens the station's Music Library Downloads spreadsheet — a sprawling Google Sheet maintained by a rotating cast of volunteers — and manually cross-references the two. Which of the releases we added to the library actually got airplay? How many times? Is one distributor dominating the chart?
It takes hours. It's important work. And it's exactly the kind of structured, repetitive task that AI should be able to handle.
So over a weekend, we built a skill — a portable, reusable tool — that automates the cross-referencing and produces a formatted Excel report. The process taught us more about the practical realities of AI-assisted automation than any enterprise case study could, precisely because the stakes were low enough to be honest about what worked, what didn't, and what surprised us.
Starting with the mess
The first thing we learned is that real-world data is messy in ways you don't anticipate until you look at it.
The Music Library Downloads spreadsheet isn't a clean database. It's a living document maintained by multiple volunteers across more than two years of weekly sheets. Distributor names serve as section headers in column A. Login credentials for download portals are scattered between release entries. The "V/A" convention for compilations means a Various Artists release is filed under a two-character artist name that doesn't match how Spinitron credits the individual track artists.
Spinitron has its own quirks. The same song might be tagged as a single release even after the full album has dropped. Release names carry suffixes like "- Single" or "- EP" that the library doesn't use. A track from a compilation gets credited to the individual artist, not to "V/A." And the label field says "Les Disques Bongo Joe" where the library just says "Bongo Joe."
None of this was in a requirements document. It emerged from looking at the actual data, running the first version of the script, and examining what it got wrong.
Building iteratively, not speculatively
We didn't start with a spec. We started with a question: can we match these two datasets?
The first pass used exact matching on normalized artist and release names. It caught about 125 matches out of roughly 800 unique “spins” across four weeks of data. Good enough to prove the concept. Not good enough to be useful.
The second iteration added fuzzy matching, which caught things like "Masks - Single" matching "Masks b/w Sifting" and "Ancestros Futuros" matching "Vol. III: Ancestros Futuros." The match count climbed, but we started seeing false positives — Charli xcx's "BRAT" fuzzy-matched against a V/A compilation about ARP synthesizers.
Each problem we found led to a specific fix. Compilation matching needed a higher threshold than standard matching. Short release names needed a minimum length filter. The fixes were surgical, not speculative, because we could see exactly what went wrong in the output.
This is the part that tutorials skip. They show you the clean path from problem to solution. Real development is a series of "huh, that's wrong" moments, each one revealing something about the data you didn't know before.
The human in the loop isn't optional
The most important features in the tool didn't come from algorithmic cleverness, but from domain knowledge.
A volunteer at the station noticed that Slocomotion's "Samarcande" — tagged as a standalone single in Spinitron — is actually a track from a Bongo Joe compilation in the library. The artist names don't match (Slocomotion vs. V/A). The release names don't match (Samarcande vs. Futur Simple: Bongo Joe 10 Years). No amount of fuzzy matching would connect those two. But they share a record label: Bongo Joe.
That observation led to label-anchored matching — a third matching strategy where, if an artist appears on the same label in both datasets, the tool flags it as a candidate for human review. It turned out to be one of the most useful features in the report, catching cases like Gorillaz singles matching back to the album, Mountain Goats tracks on Cadmean Dawn, and mclusky on Ipecac.
The music department also cares about distributor balance — they don't want one distributor dominating the airplay chart. That's not a data problem; it's a policy concern. But the data supports it: the spreadsheet's section headers in column A are the distributor names. Once we knew that mattered, adding a "Distributor" column and a breakdown table was straightforward. The technical implementation was trivial. Knowing it was needed was the hard part.
And then there was the ampersand. During the first real-world run, a human who was casually browsing Spinitron playlists — not out of obligation, but curiosity — noticed that Iron & Wine's "In Your Ocean" didn't appear in the report despite being in the library. The cause: Spinitron uses "Iron & Wine" while the library spells it "Iron and Wine." The normalization function stripped the ampersand as punctuation, leaving "iron wine," while "and" survived as a regular word, giving "iron and wine." A one-line fix (convert & to and before stripping punctuation) resolved it permanently.
That fix only happened because a human was looking at the data with curiosity rather than just checking a box.
What the tool actually does now
After several iterations, the skill produces a seven-tab Excel report:
The main tab shows every library add that got airplay, sorted by spin count, color-coded by match type (white for exact, yellow for fuzzy, green for compilation), with label and distributor columns. A summary tab provides quick stats and a distributor breakdown. Dedicated tabs break out fuzzy matches for review, compilation matches with per-compilation totals, consolidation candidates (same artist with multiple release names), and label-anchored candidates (same artist and label but different release name).
There's also a "Single Overrides" mechanism. When the music department head confirms that a Spinitron single belongs to a library album, she adds one row to an overrides sheet in the library spreadsheet. Next time the report runs, those spins automatically roll up into the album count. The override list accumulates over time, and the manual review gets shorter each week.
Portability was a surprise
The skill is two files: a SKILL.md (instructions in markdown) and a Python script. We packaged them as a .skill file (which is just a zip) for Claude's skill system. But when we handed the same file to ChatGPT and asked it to run the report, it produced identical output. 48 matches, same distributor breakdown, same compilation catches, same everything.
This wasn't a design goal. It was a side effect of putting the logic in a standalone Python script rather than in platform-specific prompting. The SKILL.md serves double duty: it's the trigger and instruction layer for Claude's skill system, and it's perfectly readable documentation for any LLM — or human — that needs to understand what the script does.
The lesson: if you want portability, put the intelligence in the tool, not in the prompt.
The cultural challenge is harder than the technical one
The station runs on donations and volunteer labor. The hardware is second hand and far from gently used. The more ambitious automation possibilities — like using Cowork to monitor incoming email from distributors, download MP3s, and update the library spreadsheet automatically — require gear the station doesn't have and a comfort level with AI that many volunteers don't share.
And there's a subtler challenge: in volunteer organizations, the toil can feel noble. When someone has spent years doing the manual cross-reference, telling them a script can do it in 30 seconds can feel like it's diminishing what they contributed. The framing that works better: the tool handles the 85% that's mechanical, so the human can focus on the 15% that requires actual expertise — the judgment calls, the policy decisions, the institutional knowledge that no algorithm can replicate.
The Bevis Frond situation illustrated this perfectly. The report correctly matched the release, but the music department head knows the album doesn't officially drop until April. A DJ is spinning advance promo tracks. The station doesn't chart singles. None of that is in the data — it's in someone's head. The report gives her the starting point; her knowledge makes it accurate.
What we're continuing to learn
This isn't a finished product. It's a tool that gets incrementally better each time someone uses it and reports what it got wrong.
We haven't validated it against a manually-produced report to measure true accuracy. We don't know the false negative rate — how many matches the human reviewer would have caught that the script missed entirely, beyond the ones we happened to spot. The compilation matching still produces occasional false positives. The label-anchored candidates tab is intentionally generous, surfacing possibilities for human review rather than making confident matches.
We also haven't built a proper test suite. The skill was developed through conversation, with a domain expert providing real-time feedback. That worked because the expert was present. If someone else needs to modify the skill later, they'll have no automated way to verify their changes didn't break something.
These are real gaps. They're also the kind of gaps that close naturally through use. Each weekly run is a de facto test case. Each "hey, this is wrong" from the music department head is a bug report. The single overrides sheet is a growing corpus of ground truth. The tool learns from the organization's accumulated knowledge, one correction at a time.
The takeaway
The most useful AI tools aren't the ones that replace human judgment. They're the ones that relocate human attention from drudgery to the places where it matters. A volunteer at a community radio station shouldn't spend her Monday night matching text strings across spreadsheets. She should spend it browsing playlists out of curiosity and building that implicit understanding of what goes on with the airwaves.
We built this in a weekend. The script is about 800 lines of Python. The documentation is markdown. It runs anywhere that can handle an AI skill. And it turned a multi-hour manual process into a 30-second report followed by five minutes of informed human review.
That's not a revolution. It's a weekend project. But it's a real one, with real data, real users, and real edge cases that no tutorial would have predicted. And that's probably worth more than another enterprise case study with redacted screenshots.