Update TTS providers and story markup

2026-05-20 22:13:31 +02:00
parent b911c40d89
commit 8258ea2321
36 changed files with 1482 additions and 197 deletions
@@ -209,6 +209,9 @@ Supported story tags include:
 - `#sfx[file](max=8 fade fade-duration=2)`
 - `#music[file](crossfade loop lead=4)`
 - `#gloss[term](definition)`
+- `#tts[instruction]`
+- `#tts(instruction)`
+- `#tts[provider](instruction)` / `#tts-openai[instruction]`
 - `#score[...]`
 - `#error[...]`
 - `#achievement[...]`
@@ -222,6 +225,9 @@ Choice tags:
 - `#action[name]`

 The active choice UI is one list. Explicit keys are reserved first, then remaining choices receive `1` through `0`, then `A` through `Z`.
+Before key assignment, choices are ordered by invisible `#action` groups. The first appearance of each action group in the authored list determines group order. Choices inside each group are randomized for presentation. Choices without an action group form one final group shown last. Group labels are not displayed.
+
+TTS instruction tags are paragraph/block metadata. They are ignored by renderers and by providers that do not support per-request reading instructions. Providerless `#tts[...]` and `#tts(...)` are the default authoring forms; provider-specific forms are optional filters for provider overrides. OpenAI consumes matching instructions only for `gpt-4o-mini-tts`, where they are sent as the Speech API `instructions` field. Instructions should describe delivery, such as tone, emotion, intonation, pace, accent, whispering, humming, or singing style.

 Markdown emphasis:

@@ -233,7 +239,7 @@ Markdown emphasis:

 ## Audio, TTS, And Media

-TTS providers currently include `none`, Browser Speech, Kokoro, ElevenLabs, and OpenAI. Provider modules exist, but Browser Speech and Kokoro need focused validation before being considered production-ready.
+TTS providers currently include `none`, Browser Speech, Kokoro, ElevenLabs, OpenAI, and local OpenAI-compatible servers. Provider modules exist, but Browser Speech and Kokoro need focused validation before being considered production-ready.

 TTS cache keys include provider, voice, provider speed value, language, and exact normalized TTS string. Fast-forward must accelerate visible animation and fade/stop active TTS without cancelling background generations unless the foreground block has been waiting long enough.