Update TTS providers and story markup

This commit is contained in:
2026-05-20 22:13:31 +02:00
parent b911c40d89
commit 8258ea2321
36 changed files with 1482 additions and 197 deletions
+30 -2
View File
@@ -42,6 +42,34 @@ The conductor points toward Eibenreith.
The bracket value is the visible term to find. The parenthesized value is the note shown on hover/focus. The renderer marks every matching instance of the term in the same right-page block. The tag is not displayed and is not sent to TTS. Avoid raw Ink control characters in the explanation; `|`, `{`, and `}` must be escaped in Ink as `\|`, `\{`, and `\}` if they are needed literally.
## TTS Reading Instructions
TTS instruction tags are story tags scoped to the paragraph/block they belong to. They are not rendered, and they are only sent to TTS providers that support per-request reading instructions. Currently this means OpenAI with `gpt-4o-mini-tts`.
```ink
„Ich habe nichts gesehen“, sagt Viktor.
#tts[Read softly, with controlled unease.]
```
The default form omits a provider and is the preferred authoring style. Providers that support instructions may consume it; providers that do not support instructions silently ignore it. Provider-specific instructions are only needed when two providers should receive different direction, or when an instruction must be hidden from all but one provider. They use the tag parameter position:
```ink
„Ich habe nichts gesehen“, sagt Viktor.
#tts[openai](Read softly, with controlled unease.)
```
The shorthand `#tts-openai[...]` is also accepted. `#tts(...)` is equivalent to providerless `#tts[...]` if parentheses read better in a local context. `tts-1` and `tts-1-hd` ignore these instructions because the OpenAI speech endpoint only supports the `instructions` request parameter for `gpt-4o-mini-tts`.
Keep instructions short and describe performance rather than content. OpenAI's TTS guide recommends using `gpt-4o-mini-tts` when you need controllable delivery; useful instruction targets include tone, emotional range, intonation, speaking speed, accent, impressions, and whispering. Good examples:
```ink
#tts[Speak with restrained concern and a slower pace.]
#tts[Whisper the line with controlled urgency.]
#tts-openai[Use a dry, formal tone; avoid melodrama.]
```
Avoid repeating the full dialogue in the instruction. Put the words to be spoken in the story text, and use `#tts` only to describe how the provider should read that block.
## Choice Metadata
Choice tags are placed on the Ink choice they belong to:
@@ -56,9 +84,9 @@ Implemented choice metadata:
- `#key:x`: reserves keyboard key `X` for the choice.
- `#letter[x]`: older equivalent for reserving keyboard key `X`.
- `#action:group` or `#action[group]`: stores a category/template hint.
- `#action:group` or `#action[group]`: assigns the choice to an invisible action group.
The current UI renders all choices in one list. Explicit keys are assigned first; choices without explicit keys receive `1` through `0`, then `A` through `Z` in visible order while skipping explicit keys. `#optional` choices are displayed italic. Grouping columns, stable shuffling, `#gated[...]`, and `#sort[...]` are documented authoring conventions or future metadata, not fully implemented UI behavior yet.
The current UI renders all choices in one visible list. Choices are first grouped by `#action` in the order each new action group appears in the authored choice list. Choices inside each group are randomized. Choices without `#action` form one final unlabelled group shown after all tagged groups. Explicit keys are assigned before automatic keys; choices without explicit keys receive `1` through `0`, then `A` through `Z` in final visible order while skipping explicit keys. `#optional` choices are displayed italic. Grouping columns, `#gated[...]`, and `#sort[...]` are documented authoring conventions or future metadata, not fully implemented UI behavior yet.
## Popup And End-State Tags