· 4 min read ·

The Stack Behind These Articles

MDX, phoneme annotations, OmniVoice on CUDA, and a static FLAC — here's the full pipeline behind every article on this site.

Listen to article
0:00 / 0:00

Each article on snazzie.space is an MDX file. When you hit the play button, you’re listening to audio generated by a local Python script before the article was ever committed. Here’s what the stack looks like end to end.

Step 1
Write MDX article
Prose, frontmatter, and content — no phoneme annotations yet.
Step 2
Enrich with LLM
Custom skill annotates acronyms and unusual terms with CMU phonemes from a curated list.
Step 3
Python TTS model
OmniVoice generates FLAC audio + waveform JSON, committed with the article.

The content layer

Articles live in the content directory as MDX files. Astro’s content collections type-check the frontmatter at build time — title, date, excerpt, tags, draft flag. The slug comes from the filename.

MDX gets rendered at build time. No client-side markdown parsing, no JavaScript fetching content on load. The page that reaches you is static HTML. Astro ships zero JavaScript by default, so the only scripts on an article page are the audio player and a small toggle for the raw TTS script view — hit see raw script at the top of any article to see exactly what text gets fed to the voice model.

Phoneme annotations

The TTS model, OmniVoice, uses CMU ARPAbet phonemes. Acronyms and unusual terms get annotated inline in the MDX with their phoneme sequence. The display layer strips the bracket content so the reader sees the term as written, while the TTS receives the correct pronunciation. A maintained table of annotated terms lives in the article skill, so new articles get consistent pronunciation without re-solving the same phoneme lookups.

Without markerWith marker
SQL”s-q-l”[S IY1 K W AH0 L]“sequel”
KV(silent)[K EY1] [V IY1]“K-V”
SSR”sarr”[EH0 S, EH0 S, AA1 R]“S-S-R”
LLM(silent)[EH1 L, EH1 L, EH1 M]“EL-EL-EM”
UTC”you-tee”[Y UW1] [T IY1] [S IY1]“U-T-C”

The same annotation system also handles paralinguistic markers — pauses, soft questions, mild surprise — placed at the start of sentences to shape delivery. These are stripped from the display entirely.

Audio generation

The Astro build processes each MDX file and writes a plain text TTS script. That file is the single source of truth — it strips MDX syntax, converts em dashes to pauses, flips phoneme annotations from written form to their phoneme sequence, and excludes any HTML block marked with a skip attribute — the comparison table on this page uses that to stay out of the narration. The “see raw script” toggle on each article shows exactly this output.

A Python script then reads that file, seeds the RNG for reproducibility, and passes the text to OmniVoice running on CUDA with a fixed reference voice sample. Every article uses the same reference wav, so the voice is consistent. Speed is set to 1.15x. Output is a FLAC and a waveform JSON — 200 normalised peak amplitude values used to render the player visualisation.

Both files are committed with the article and served statically from Cloudflare’s CDN. No API calls at read time. The TTS inference runs once on my machine before commit. After that it’s just a file.