Content Pipeline
A seven-stage AI pipeline that takes a topic and produces a publication-ready article — SEO metadata, generated images with deterministic post-processing, and direct CMS publishing — while enforcing the client's specific voice, framework, and quality rules at every stage.
The ask was a tool their team could actually use — not a proof of concept that needed a developer in the room to run. A niche content publisher was producing domain-specific long-form articles at volume and needed every piece to be indistinguishable from their best human writing — not just in style, but in strict adherence to a specific intellectual framework, a controlled vocabulary, and quality standards that had previously required manual review on every piece.
Decompose the generation task into seven discrete stages, each with its own model, its own prompt, and its own job. Haiku handles fast, structured work: research analysis, title refinement, image scene descriptions, HTML conversion. Sonnet handles the reasoning-heavy work: drafting, outline construction, voice-sensitive rewrites, and adversarial evaluation. Each stage accumulates context for the next — by the time the draft runs, it has real search data, retrieved source passages, a structured outline, and key takeaways already in context. A Server-Sent Events stream surfaces progress to the client in real time across a multi-minute generation run.
- SERP research stage — live search results fed to Claude for structured intent analysis: keyword clusters, audience targeting, competitor angles, writing style signals
- RAG-grounded generation — curated source library indexed in a local vector store; top passages retrieved at the article level and per section heading, injected into the draft prompt as an organized block, not appended noise
- Seven-stage pipeline — topic → SERP research → title refinement → key takeaways → outline → full draft → evaluation → image generation → CMS publish; each stage feeds the next
- Adversarial evaluation stage — a second Sonnet call with a critic prompt audits the draft against the client's framework rules, banned vocabulary, and factual risk categories; returns a structured PASS / FLAG / FAIL verdict with flagged line references
- Image generation with deterministic post-processing — AI-generated thumbnails and chapter headers run through a multi-step PIL pipeline that applies texture, grain, vignette, and text compositing; output looks designed, not generated
- Direct CMS publishing — images uploaded to cloud storage, HTML rewritten with final URLs, posts record upserted in the client's Supabase backend; one click from draft to live
- Real-time SSE streaming — every pipeline stage streams progress to the browser; blocking AI calls run in async threads so the event loop stays clear across the full generation window
The pipeline runs end to end without a developer present — topic in, publication-ready article out, with generated images, SEO metadata, and a direct CMS publish in a single session.
FastAPI + Uvicorn — async-native, SSE streaming
Claude Sonnet — outline, draft, evaluation, rewrites, TikTok scripts
Claude Haiku — SERP analysis, title, scene descriptions, HTML conversion
SerpAPI — live Google results for research grounding
ChromaDB (local vector store) + OpenAI embeddings
fal.ai / DALL-E 3 — with PIL post-processing pipeline
ElevenLabs TTS — voice-tagged script to audio
Supabase — Postgres posts table + Storage for images
Railway — Nixpacks build, zero-config
Four problems that define the difference between a demo and a production content tool. Voice enforcement at scale: the system encodes the client's voice as a structured prompt — specific rhetorical patterns, a banned vocabulary list, naming conventions, and a required opening arc — then audits the output against a separate adversarial prompt that acts as a critic rather than a creator. One generation pass isn't enough; the evaluation stage is the quality gate. RAG that changes the output: source passages are retrieved at two levels (article and per-section heading), organized into a structured block in the draft prompt, not appended as a context dump — the articles cite real arguments from real sources without hallucinated attribution. Image post-processing as a quality layer: AI-generated images look AI-generated; a deterministic PIL pipeline runs on every image to apply aging, texture, and composited text so the output looks like a human designer worked it. Content filter handling: different image generation providers have different filter behavior on domain-specific imagery; the system detects rejection, retries with a softened prompt, and falls back across providers — discovered through production testing, not documented anywhere.
The audio integration treats the short-form script as a production document, not a transcript to be cleaned up later. Voice tags are written directly into the script at generation time — tone directions, emphasis markers, pause annotations — so the script is authored with audio production in mind from the start. Before the text reaches the TTS API, a stripping pass removes visual stage directions and anything the voice model shouldn't read aloud, while preserving the inline voice tags that are native to the API. The voice selection pulls the client's actual account voices live, not a hardcoded list. Output is saved and served via the same API, completing the path from article to voiced audio without leaving the tool.
The short-form script generator is a separate content format built on top of the article, not derived from it mechanically. The same voice constraints that govern the long-form draft apply here: same intellectual framework, same banned vocabulary, same naming conventions — just compressed into 80 to 150 words written for spoken delivery. Short declarative sentences, momentum-building structure, a hook in the first ten words. The script optionally accepts a persona parameter to adjust tone, and a bypass mode lets the client paste a hand-written script directly and route it straight to audio production. The result is a one-path flow from article to short-form script to voiced audio, all within the same tool, with the same quality rules applied throughout.
The evaluation stage is designed as a handoff point to a human editor, not a gate that blocks publishing. After the draft, a second Sonnet call with an adversarial critic prompt audits against eight categories — framework adherence, vocabulary rules, naming conventions, factual risk, and style violations — and returns a structured verdict with flagged line references, not vague summaries. Where the human comes in: every flagged paragraph is clickable in the preview. The editor opens the paragraph inline, adds an optional instruction, and triggers a targeted Claude rewrite that replaces only that paragraph. The rest of the article is untouched. The design principle: the AI flags the problem, the human decides what to change, the AI executes the targeted fix. No full regeneration, no loss of surrounding context, no overwriting 2,500 words to correct two sentences.
The design pattern here generalizes to any domain where a client has a specific voice, a specific knowledge base, and a specific quality bar. Generate with the right model for each stage. Ground in a curated source corpus. Audit with an adversarial prompt before anything ships. Apply deterministic post-processing where AI output isn't controllable enough. That architecture — not any particular prompt — is what produces output that clients trust enough to publish without a human rewrite.