The Cost of Being the Guinea Pig
AI tooling moves fast. Unreasonably fast.
Claude Code alone ships updates several times a week. New flags, new hooks, new MCP servers. Open X and someone is calling a config “the strongest setup yet.” The next day, someone else is saying “no, this one is better.”
The catch: someone has to validate all of it.
AI dramatically accelerated the work. A corporate site in a day, a slide deck from a single command, strategy sessions and execution running side by side with AI. Hands should have been freed up.
The freed hands got swallowed by AI maintenance.
Test new prompt techniques. Adjust config files. Doesn’t work. Investigate. Fix. Works. Next week it changes again. One to two hours a day disappears into this infinite loop. The time AI saved is now spent caring for AI.
The Thing I Actually Wanted to Automate
While brainstorming with the AI agent team, something clicked.
The work itself is trigger-based. Client requests come in, work happens. There’s almost no recurring routine. There was nothing to automate — or so it seemed.
What I actually wanted to automate was the evolution of AI itself.
Someone posts a new Claude Code feature on X. Find it. Evaluate it. Decide if it’s worth trying. Implement. Test. Keep what’s useful, drop what isn’t.
That entire flow can run on AI. Automate the guinea pig.
Design: Watch → Score → Propose → Approve
Here’s the system.
X API v2 for real-time monitoring. Posts containing keywords like Claude Code, Openclaw, CLAUDE.md, MCP get collected hourly. Search queries live in a JSON config file, so adding new channels doesn’t require touching code.
Scoring assigns priority. A base score of 0.3 stacks with keyword matches (+0.1 each), engagement (likes and reposts), and noise filters (job ads and crypto get penalized) to produce a 0–1 score. Above 0.7 qualifies as a candidate.
Clustering detects topic concentration. When different accounts post about the same keyword, that’s a trend. Three or more posts adds +0.2, five adds +0.3, ten adds +0.4. Group signal that individual posts can’t carry on their own.
Discord receives the proposals. Posts above the threshold arrive as Embed messages with approve and reject buttons. English posts come with auto-translated Japanese subtitles attached.
The work that’s left: pressing a button.
Built in One Session
From concept to running system: one session.
To be precise, the Discord button infrastructure and approval workflow existed from a previous session. That day’s work was X API integration, scoring engine, clustering, multi-channel support, Discord posting, auto-translation. All the core logic.
Stack: Bun + TypeScript. The X API v2 Recent Search endpoint runs as a periodic check via HeartbeatRunner. State persists to JSON files. Proposals save to SQLite, linked to Discord interactions.
Nothing exotic happens here. Hit an API, calculate a score, threshold, notify. The pieces are simple. What matters is the time from “let’s do this” to a running system.
Channels Expanded to Four
Started with Claude Code only. Watching it run, ambition grew.
The same machinery covers every domain of interest.
Final config: four channels.
- Claude Code / AI development tools: Claude Code, Openclaw, CLAUDE.md, hooks, MCP
- Tech stack: Astro, Bun, Vercel, Tailwind CSS
- Advertising and marketing: agency disintermediation, in-house transitions, creative AI
- Design / UI/UX: UI trends, rebrands, design systems
Each channel has dedicated search queries, keyword dictionaries, and noise filters. Adding or modifying a channel only requires editing config/x-trends-channels.json. No code changes.
First fetch: 216 tweets collected. 91 high-scoring. 20 topic clusters detected. The thing runs.
Proposals Just Show Up
Discord notifications arrive. Embed cards display the post content, score, keywords, and channel name. Approve and reject buttons. English posts include automated translations from Claude Haiku.
Everything needed to decide is already on screen. Zero search time. Ten seconds to read. Three seconds to decide.
The first real proposal that came through: an article about Rakuten’s in-house design team and their AI workflow. The word “Adobe” appeared. Instant reject. I have never seen anyone use Adobe’s AI in production work. Three seconds.
This kind of filtering is exactly what humans should do. AI gathers and evaluates. Humans tell genuine from fake in an instant.
This Was the Real Wall to AI Adoption
A lot of people can’t get AI into their work. The tools are cheap. Documentation exists. Speed gains are obvious.
The reason adoption stalls is the guinea pig cost.
AI tools change weekly. Last week’s best practice is this week’s outdated approach. Trying new features takes time. Rolling back when they don’t fit takes more. This maintenance overhead eats the productivity benefit.
“AI turns ten hours of work into one hour” is true. “Keeping AI current costs three hours a week” is also true. Net six hours saved. Not bad — but anyone expecting nine hours feels cheated.
Automating the guinea pig zeroes those three hours. Not literally zero — pressing approve buttons takes time. But three hours of patrolling X, reading articles, testing tools, and evaluating compresses into a few ten-second button presses.
Don’t Stop the Evolution of Your Weapons
Automating work is everywhere. Auto-invoicing, auto-replies, auto-generated reports.
Automating the evolution of your weapons is almost nowhere.
What I built: a pipeline that collects, evaluates, and proposes the latest AI tooling — built by AI itself. The weapon hunts for its own evolutionary seeds and asks the wielder “how about this one?” The wielder issues Go or No-Go.
This isn’t just efficiency. It might be one viable answer to the AI-human division of labor.
AI searches wide, fast, without rest. Humans tell genuine from fake in an instant. As long as that loop runs, the weapon evolves on its own.
Concept to production: one session. Stack: X API v2, Bun, TypeScript, Discord.js, Claude Haiku. Nothing technically special.
What was needed: the idea of automating the guinea pig itself.