Files
ai.interactive.fiction/SPECIFICATION.md
T

255 lines
12 KiB
Markdown

# AI Interactive Fiction Specification
This is the single architecture and behavior specification for the project. Usage and changelog live in `README.md`; actionable work items live in `TODO.md`; authoring conventions live in `MARKUP_GUIDELINES.md`.
## Product Goal
AI Interactive Fiction is a shared book-style web client plus interchangeable game engine servers. The client renders interactive fiction as animated, carefully typeset illustrated prose with optional speech, music, sound effects, images, choices, and command input. Game engines own game state and emit a shared structured protocol.
The production client must tolerate speech being unavailable. The safe TTS provider default is `none`; a game or player preference may select another provider.
## Repository Layout
- `public/`: shared browser UI, assets, fonts, client modules, third-party browser libraries.
- `src/`: TypeScript servers, shared protocol types, engine implementations, YAML world model, CLI support.
- `config/engines/`: per-engine configuration files.
- `data/ink-src/`: Ink source files.
- `data/ink/`: compiled Ink JSON output.
- `data/worlds/`: YAML world files.
- `data/z-code/`: Z-machine story files such as `zork1.bin`.
- `data/zcode-prompts/`: prompt templates used by the current LLM-mediated Z-code narrator.
- `scripts/`: project utility scripts. Currently used: `check-node-version.js` and `run-engine.js`.
- `templates/`: not present in the current repository and not used.
## Engine Selection And Commands
`DEFAULT_GAME_ENGINE` in `.env` selects the engine used by:
```text
npm run dev
npm run start
```
Supported values are `ink`, `yaml`, and `zcode`.
Engine-specific commands bypass the default:
```text
npm run dev:ink
npm run dev:yaml
npm run dev:zcode
npm run start:ink
npm run start:yaml
npm run start:zcode
```
`dev:*` runs TypeScript through `ts-node` and `nodemon`. `start:*` runs compiled JavaScript from `dist/` and builds first through `prestart:*`. `*:debug` enables the engine's debug environment flag. `*:inspect` starts Node inspector and currently also enables debug for that engine.
The CLI path is YAML-only and uses `src/index.ts --cli`. It is useful for testing the YAML `GameRunner` without the browser UI. The old `test-server-yaml.ts` is a legacy static/YAML harness and should be removed once no workflow depends on it.
## Shared Server Protocol
All engines communicate with the browser through Socket.IO and the same game API:
```text
newGame()
loadGame(slot)
saveGame(slot)
hasSaveGame(slot)
getSaveGames()
isGameRunning()
chooseChoice(index)
```
The Ink engine additionally supports browser-owned session recovery:
```text
resumeGame(savedInkState)
exportGameState()
```
`exportGameState()` returns the current Ink state without creating a server-side save slot. The client stores that state with story history, choices, input mode, and media state in IndexedDB. `resumeGame(savedInkState)` rehydrates a fresh server-side InkEngine after a socket reconnect or browser reload without emitting duplicate narrative. This keeps durable player-specific state client-side for hosted multi-client Ink deployments.
Line-input engines also use `playerCommand` for free text.
Every engine emits `TurnResult` objects:
```ts
interface TurnResult {
turnId: number;
paragraphs: Array<{ text: string; tags?: StoryTag[] }>;
choices: ChoiceResult[];
inputMode: 'text' | 'choice' | 'end' | 'none';
globalTags?: StoryTag[];
gameState?: {
score?: number;
endState?: { type: 'intended' | 'error'; message?: string };
};
suggestions?: string[];
}
```
The browser consumes structured `TurnResult` data only. YAML and Z-code servers must parse or synthesize the same tag objects that Ink exposes through native tags.
## Game Engines
### YAML Engine
- Config: `config/engines/yaml.json`
- Server: `src/server-yaml.ts`
- World model: `data/worlds/*.yml`
- CLI entry: `src/index.ts --cli`
The YAML engine is no longer the architectural default; it is one engine beside Ink and Z-code. It uses `GameRunner`, `GameEngine`, and `YamlWorldParser`, emits `inputMode: 'text'`, and remains the best test bed for deterministic world-model plus LLM command interpretation.
### Ink Engine
- Config: `config/engines/ink.json`
- Server: `src/server-ink.ts`
- Engine: `src/engine/ink-engine.ts`
- Source: `data/ink-src/eibenreith.ink` plus included chapter files.
- Compiled output: `data/ink/eibenreith.ink.json`
The Ink server compiles source at startup using `inkjs/full`, then runs the compiled story with `inkjs`. Ink choices become `ChoiceResult` objects. Ink tags become shared `StoryTag` objects. Choice preview tags support `#key`, `#letter`, `#optional`, `#action`, `#gated`, and `#sort`.
The server keeps only ephemeral per-socket InkEngine instances. Browser IndexedDB owns durable Ink saves and the current autosave. If the socket reconnects or the page reloads, the browser sends the autosaved Ink state to `resumeGame()` and restores rendered history locally.
Ink does not provide arbitrary string input as a native async primitive comparable to choices. Future text-input turns should be implemented through a tag such as `#input[name](prompt)`: the server returns `inputMode: 'text'`, the UI shows command input for one round, then the server stores the submitted string into an Ink variable and continues.
### Z-code Engine
- Config: `config/engines/zcode.json`
- Server: `src/server-zcode.ts`
- Engine: `src/engine/zcode-llm-engine.ts`
- Story file: `data/z-code/zork1.bin` by default.
- Prompt templates: `data/zcode-prompts/*.yml`
The engine name is Z-code. Zork I is only the current game file and prompt target. The current implementation runs a Z-machine story through `ifvms`, keeps Z-machine state authoritative, and uses an LLM to translate natural-language input into parser commands and rewrite raw Z-machine output into prose.
Future work should separate Z-code-generic logic from Zork-specific prompt content more clearly.
## Client Module System
The browser client uses native ES modules, no bundler. The loader imports modules, analyzes dependency declarations, initializes modules in dependency order, tracks state/progress, and hides the loading overlay only when initialization and progress exit animations are complete.
Rules:
- Every app module extends `BaseModule`.
- Every app module registers with `moduleRegistry`.
- Required dependencies must be listed in `dependencies`.
- Modules should use authoritative dependencies instead of local fallbacks.
- Do not add fallback paths to hide bad dependency declarations or ordering bugs.
- `setTimeout` must not paper over initialization races. It is acceptable for animation, debounce, throttle, and browser rendering timing when locally justified.
Core modules:
- `loader.js`: module script loading, progress UI, dependency diagnostics.
- `module-registry.js`: registration and readiness promises.
- `base-module.js`: lifecycle, progress, state, event cleanup.
Primary client responsibilities:
- Text and typography: `text-processor`, `paragraph-layout`, `layout-renderer`.
- Markup: `markup-parser`.
- Queue/playback: `text-buffer`, `sentence-queue`, `playback-coordinator`, `animation-queue`.
- Audio/TTS: `audio-manager`, `tts-factory`, provider modules.
- UI: `ui-controller`, `ui-display-handler`, `ui-input-handler`, `choice-display`, `options-ui`, `ui-effects`.
- Persistence/history: `persistence-manager`, `story-history`.
- Networking: `socket-client`.
Known cleanup candidates: `debug-utils-module.js` is not loaded; `game-loop-module.js` still contains high-level glue from older architecture and should be audited before removal.
## Text Pipeline
Processing order:
1. Receive structured blocks and tags from a game engine.
2. Parse inline story markup and remove media markers from display/TTS text.
3. Apply Markdown emphasis.
4. Apply locale-aware SmartyPants typography.
5. Apply Hyphenopoly for the game metadata language.
6. Measure text using the exact page font settings.
7. Run Knuth-Plass line breaking.
8. Render absolutely positioned words into the page line-coordinate model.
9. Animate words in sync with measured TTS duration or estimated duration.
The external Knuth-Plass library should not be locally modified. Adaptation belongs in our modules.
## Right Page Layout And History
The right page is a virtual line-addressed content pane:
- `#page_right` does not use native scrolling.
- Page height is divided into `PAGE_LINE_COUNT = 25`.
- All block heights, margins, image spacing, and chapter/section spacing are exact line multiples.
- Stored block positions are line coordinates, not pixels.
- Window resize recalculates pixels from line coordinates.
- New content appends at the live bottom.
- Manual scrolling moves the active line and keeps a window of nearby blocks loaded.
- The custom scrollbar represents virtual line history, not DOM scroll state.
Portrait images may overlap line ranges with text next to them, but edges must still land on line boundaries.
## Markup And Tags
Canonical tag syntax:
```text
#key
#key[value]
#key[value](options)
#key:value
```
Supported story tags include:
- `#chapter[Title]`
- `#section` / `#textblock`
- `#image[file](landscape|portrait|square pause=2)`
- `#sfx[file](max=8 fade fade-duration=2)`
- `#music[file](crossfade loop lead=4)`
- `#gloss[term](definition)`
- `#tts[instruction]`
- `#tts(instruction)`
- `#tts[provider](instruction)` / `#tts-openai[instruction]`
- `#score[...]`
- `#error[...]`
- `#achievement[...]`
- `#alert[...]`
Choice tags:
- `#key:x` or `#key[x]`
- `#letter[x]`
- `#optional`
- `#action[name]`
The active choice UI is one list. Explicit keys are reserved first, then remaining choices receive `1` through `0`, then `A` through `Z`.
Before key assignment, choices are ordered by invisible `#action` groups. The first appearance of each action group in the authored list determines group order. Choices inside each group are randomized for presentation. Choices without an action group form one final group shown last. Group labels are not displayed.
TTS instruction tags are paragraph/block metadata. They are ignored by renderers and by providers that do not support per-request reading instructions. Providerless `#tts[...]` and `#tts(...)` are the default authoring forms; provider-specific forms are optional filters for provider overrides. OpenAI consumes matching instructions only for `gpt-4o-mini-tts`, where they are sent as the Speech API `instructions` field. Instructions should describe delivery, such as tone, emotion, intonation, pace, accent, whispering, humming, or singing style.
Markdown emphasis:
```text
*italic* or _italic_
**bold** or __bold__
***bold italic*** or ___bold italic___
```
## Audio, TTS, And Media
TTS providers currently include `none`, Browser Speech, Kokoro, ElevenLabs, OpenAI, and local OpenAI-compatible servers. Provider modules exist, but Browser Speech and Kokoro need focused validation before being considered production-ready.
TTS cache keys include provider, voice, provider speed value, language, and exact normalized TTS string. Fast-forward must accelerate visible animation and fade/stop active TTS without cancelling background generations unless the foreground block has been waiting long enough.
Music and sound effects are preloaded when requested. Music can queue, crossfade, cut, loop, play once, and lead into following text. Music ducks by a persisted percentage during TTS playback.
## Documentation Source Of Truth
- `README.md`: usage, commands, changelog, concise feature summary.
- `SPECIFICATION.md`: architecture and behavior.
- `TODO.md`: active status and backlog.
- `MARKUP_GUIDELINES.md`: writing/authoring rules for story files.
- `THIRD_PARTY_NOTICES.md` and `public/THIRD_PARTY_NOTICES.md`: license/credits material.