Files
ai.interactive.fiction/SPECIFICATION.md
T

306 lines
20 KiB
Markdown

# AI Interactive Fiction Specification
This is the single architecture and behavior specification for the project. Usage and changelog live in `README.md`; actionable work items live in `TODO.md`; authoring conventions live in `MARKUP_GUIDELINES.md`.
## Product Goal
AI Interactive Fiction is a shared book-style web client plus interchangeable game engine servers. The client renders interactive fiction as animated, carefully typeset illustrated prose with optional speech, music, sound effects, images, choices, and command input. Game engines own game state and emit a shared structured protocol.
The production client must tolerate speech being unavailable. The safe TTS provider default is `none`; a game or player preference may select another provider.
## Repository Layout
- `public/`: shared browser UI, assets, fonts, client modules, third-party browser libraries.
- `src/`: TypeScript servers, shared protocol types, engine implementations, YAML world model, CLI support.
- `config/engines/`: per-engine configuration files.
- `data/ink-src/`: Ink source files.
- `data/ink/`: compiled Ink JSON output.
- `data/worlds/`: YAML world files.
- `data/z-code/`: Z-machine story files such as `zork1.bin`.
- `data/zcode-prompts/`: prompt templates used by the current LLM-mediated Z-code narrator.
- `scripts/`: project utility scripts. Currently used: `check-node-version.js` and `run-engine.js`.
- `templates/`: not present in the current repository and not used.
## Text Encoding
Ink source files and game UI localization files must be saved as UTF-8 and must contain the real written characters. German text uses full umlauts and special characters directly, for example `ä`, `ö`, `ü`, `Ä`, `Ö`, `Ü`, `ß`, and German quotation marks `„…“`. Do not transliterate German into `ae`, `oe`, `ue`, or `ss` as an encoding workaround.
## Agent Editing Safety
These rules are mandatory for AI/Codex work on authored text:
- Do not alter authored prose, Ink text, generated character text, documentation prose, or localization text through regex, bulk replacement, or scripted text rewrites.
- Use `apply_patch` only for text edits.
- For PowerShell commands that read or display text, set `$OutputEncoding` and `[Console]::OutputEncoding` to UTF-8 first.
- Before large text work, create a git check-in.
- Edit entry by entry, inspect each edited entry after changing it, and proceed sequentially. No generated shortcuts.
## Ink Authoring State
Use Ink's built-in visit state for simple facts such as "this knot has been shown". Do not create parallel boolean flags for knot visits.
Use a separate `LIST` with the `state_*` helpers whenever a tracker expresses a linear process, even if it has only two states such as "begun" and "completed". A later state in such a list is a high-watermark and implies the earlier states. Prefer several small parallel progress lists over one overpacked encounter state when that is cleaner for authoring, knowledge modelling, or NPC reasoning. This matches the Inkle-style knowledge-base pattern: independent lines of knowledge and progress advance separately, then content queries the combination.
Use `state_reach(first_state)` to begin a progress chain. Use `state_reach_if_started(later_state)` when a normal action can advance or complete a chain only if that chain is already active. This prevents generic actions such as washing hands or inspecting an object from retroactively starting a task they merely could have fulfilled.
Use `mark`, `has`, and `lacks` only for a coherent group of independent facts that can be true separately and do not imply one another.
Eibenreith authored content uses a mandatory bucket architecture. Rooms are installed through `enter_room(location, entry, look, exits, bucket)`. The active choice surface collects choices in this order: moment, room entry/look, exits, episode, game. Chosen atomic content ends with `-> TURN`; bucket/provider knots end with `-> DONE`. Authored chapter files must not call the internal `provide_choices` implementation directly.
The canonical format for conditioned atomic bucket weaves is a multi-line choice header: choice marker, one precondition per following header line, then the choice text on its own header line. Example: `* {condition_one}` / ` {condition_two}` / ` [__Verb Charakter__: "..."]`. Naked condition lines before a choice do not gate that choice and are forbidden for bucket atoms. Visibility conditions belong in the choice header, not in the branch body.
`helpers.ink` owns global helper variables, helper functions, `TURN`, and active choice-surface dispatch. `buckets.ink` owns the game-wide bucket. Even when empty, `game_bucket` remains a real content bucket and must stay available for cross-episode game material.
Companion-aware dialogue and reunion reactions must use the central Ink contact manager instead of room-specific flags. `present(character)` checks whether an NPC is in the current room. `first_meeting(character)`, `reunion(character)`, and `parting(character)` are one-choice-surface transition helpers created by traversal and cleared automatically by the next turn. Authored content may query these helpers but must never clear or consume contact transitions manually. `alone()` is true when no tracked NPC is present. `alone_with(character)` is true when exactly that tracked NPC is present, and is intended for private dialogue options.
`loc_move_to(location)` updates Valerie's current location, records traversal origin and destination, moves active companions, and refreshes contact state. `companion_join(character)` and `companion_leave(character)` only control whether a character follows Valerie through traversal; they are not story-memory flags. Episode setup is responsible for initial character locations and companion state. After manual multi-character setup, call `contact_sync()` to establish current contact without firing first-meeting or reunion transitions.
Episodes may install an optional companion transition bucket through `enter_episode(value, slot, start_bucket, end_bucket, episode_bucket, companion_transition_bucket)`. The room engine plays that bucket centrally on every `enter_room(...)` after movement. Companion traversal prose belongs there and should use `accompanied_by(character)` together with `traversal_from(...)`, `traversal_to(...)`, or `traversal_between(origin, destination)`. Do not duplicate companion-following narration inside individual exits unless the exit has genuinely unique story action.
Room look commands are lifecycle-managed by the Ink room engine. The look bucket passed to `enter_room(...)` is offered only after Valerie re-enters a room she has previously left. Once the current look bucket has been selected during this room visit, shared Ink visit tracking hides it until the next leave-and-reenter cycle. Authored room look content must keep the `#key:l` convention and must not add custom seen/look flags.
Player-choice impact uses three distinct mechanisms. Cascades use semantic state chains when a choice changes the route, episode outcome, or later structure. Callbacks use named facts for exact remembered choices. Heuristics use route counters and relationship-matrix queries to color tone or summarize repeated patterns. Do not use a route heuristic when the later text needs to remember one specific earlier line.
When multiple choices from one prioritized family can appear on the same choice surface, use `claim_choice_gate_if(gate, available)` to allow only the first valid item in source order. This is mainly for `#auto` families such as Viktor return comments, which should also include a contact transition such as `reunion(viktor)` in their availability expression. The helper is transient and resets at the start of every `provide_choices`; it must not be used as story memory.
Delayed events that should advance while arbitrary bucket content is chosen use named turn timers. Timer IDs are LIST values. `timer_start(timer_id, turns)` restarts that named timer after removing it from all timer buckets. Expired timers remain ready until `timer_due(...)` or `timer_due_if(...)` claims them; claimed timers remain visible for the current turn and are cleared centrally by the next `TURN`. Content must not implement private countdown variables for reusable timed behavior.
## Choice Text Perspective
Choice text must describe the player character's intention before the action is taken. Do not write choices from a post-hoc author perspective that reveals what the branch will discover. For example, use "try the door" before the destination is known, not "go to the second-class cars"; use automatic or hidden events for things the player character cannot control, such as the train entering a tunnel.
## Engine Selection And Commands
`DEFAULT_GAME_ENGINE` in `.env` selects the engine used by:
```text
npm run dev
npm run start
```
Supported values are `ink`, `yaml`, and `zcode`.
Engine-specific commands bypass the default:
```text
npm run dev:ink
npm run dev:yaml
npm run dev:zcode
npm run start:ink
npm run start:yaml
npm run start:zcode
```
`dev:*` runs TypeScript through `ts-node` and `nodemon`. `start:*` runs compiled JavaScript from `dist/` and builds first through `prestart:*`. `*:debug` enables the engine's debug environment flag. `*:inspect` starts Node inspector and currently also enables debug for that engine.
The CLI path is YAML-only and uses `src/index.ts --cli`. It is useful for testing the YAML `GameRunner` without the browser UI. The old `test-server-yaml.ts` is a legacy static/YAML harness and should be removed once no workflow depends on it.
## Shared Server Protocol
All engines communicate with the browser through Socket.IO and the same game API:
```text
newGame()
loadGame(slot)
saveGame(slot)
hasSaveGame(slot)
getSaveGames()
isGameRunning()
chooseChoice(index)
```
The Ink engine additionally supports browser-owned session recovery:
```text
resumeGame(savedInkState)
exportGameState()
```
`exportGameState()` returns the current Ink state without creating a server-side save slot. The client stores that state with story history, choices, input mode, and media state in IndexedDB. `resumeGame(savedInkState)` rehydrates a fresh server-side InkEngine after a socket reconnect or browser reload without emitting duplicate narrative. This keeps durable player-specific state client-side for hosted multi-client Ink deployments.
Line-input engines also use `playerCommand` for free text.
Every engine emits `TurnResult` objects:
```ts
interface TurnResult {
turnId: number;
paragraphs: Array<{ text: string; tags?: StoryTag[] }>;
choices: ChoiceResult[];
inputMode: 'text' | 'choice' | 'end' | 'none';
globalTags?: StoryTag[];
gameState?: {
score?: number;
endState?: { type: 'intended' | 'error'; message?: string };
};
suggestions?: string[];
}
```
The browser consumes structured `TurnResult` data only. YAML and Z-code servers must parse or synthesize the same tag objects that Ink exposes through native tags.
## Game Engines
### YAML Engine
- Config: `config/engines/yaml.json`
- Server: `src/server-yaml.ts`
- World model: `data/worlds/*.yml`
- CLI entry: `src/index.ts --cli`
The YAML engine is no longer the architectural default; it is one engine beside Ink and Z-code. It uses `GameRunner`, `GameEngine`, and `YamlWorldParser`, emits `inputMode: 'text'`, and remains the best test bed for deterministic world-model plus LLM command interpretation.
### Ink Engine
- Config: `config/engines/ink.json`
- Server: `src/server-ink.ts`
- Engine: `src/engine/ink-engine.ts`
- Source: `data/ink-src/eibenreith/main.ink` plus included chapter files.
- Compiled output: `data/ink/eibenreith.ink.json`
The Ink server compiles source at startup using `inkjs/full`, then runs the compiled story with `inkjs`. Ink choices become `ChoiceResult` objects. Ink tags become shared `StoryTag` objects. Choice preview tags support `#key`, `#letter`, `#optional`, `#action`, `#gated`, `#sort`, and `#auto`.
The server keeps only ephemeral per-socket InkEngine instances. Browser IndexedDB owns durable Ink saves and the current autosave. If the socket reconnects or the page reloads, the browser sends the autosaved Ink state to `resumeGame()` and restores rendered history locally.
Ink does not provide arbitrary string input as a native async primitive comparable to choices. Future text-input turns should be implemented through a tag such as `#input[name](prompt)`: the server returns `inputMode: 'text'`, the UI shows command input for one round, then the server stores the submitted string into an Ink variable and continues.
### Z-code Engine
- Config: `config/engines/zcode.json`
- Server: `src/server-zcode.ts`
- Engine: `src/engine/zcode-llm-engine.ts`
- Story file: `data/z-code/zork1.bin` by default.
- Prompt templates: `data/zcode-prompts/*.yml`
The engine name is Z-code. Zork I is only the current game file and prompt target. The current implementation runs a Z-machine story through `ifvms`, keeps Z-machine state authoritative, and uses an LLM to translate natural-language input into parser commands and rewrite raw Z-machine output into prose.
Future work should separate Z-code-generic logic from Zork-specific prompt content more clearly.
## Client Module System
The browser client uses native ES modules, no bundler. The loader imports modules, analyzes dependency declarations, initializes modules in dependency order, tracks state/progress, and hides the loading overlay only when initialization and progress exit animations are complete.
Rules:
- Every app module extends `BaseModule`.
- Every app module registers with `moduleRegistry`.
- Required dependencies must be listed in `dependencies`.
- Modules should use authoritative dependencies instead of local fallbacks.
- Do not add fallback paths to hide bad dependency declarations or ordering bugs.
- `setTimeout` must not paper over initialization races. It is acceptable for animation, debounce, throttle, and browser rendering timing when locally justified.
Core modules:
- `loader.js`: module script loading, progress UI, dependency diagnostics.
- `module-registry.js`: registration and readiness promises.
- `base-module.js`: lifecycle, progress, state, event cleanup.
Primary client responsibilities:
- Text and typography: `text-processor`, `paragraph-layout`, `layout-renderer`.
- Markup: `markup-parser`.
- Queue/playback: `text-buffer`, `sentence-queue`, `playback-coordinator`, `animation-queue`.
- Audio/TTS: `audio-manager`, `tts-factory`, provider modules.
- UI: `ui-controller`, `ui-display-handler`, `ui-input-handler`, `choice-display`, `options-ui`, `ui-effects`.
- Persistence/history: `persistence-manager`, `story-history`.
- Networking: `socket-client`.
Known cleanup candidates: `debug-utils-module.js` is not loaded; `game-loop-module.js` still contains high-level glue from older architecture and should be audited before removal.
## Text Pipeline
Processing order:
1. Receive structured blocks and tags from a game engine.
2. Parse inline story markup and remove media markers from display/TTS text.
3. Apply Markdown emphasis.
4. Apply locale-aware SmartyPants typography.
5. Apply Hyphenopoly for the game metadata language.
6. Measure text using the exact page font settings.
7. Run Knuth-Plass line breaking.
8. Render absolutely positioned words into the page line-coordinate model.
9. Animate words in sync with measured TTS duration or estimated duration.
The external Knuth-Plass library should not be locally modified. Adaptation belongs in our modules.
## Right Page Layout And History
The right page is a virtual line-addressed content pane:
- `#page_right` does not use native scrolling.
- Page height is divided into `PAGE_LINE_COUNT = 25`.
- All block heights, margins, image spacing, and chapter/section spacing are exact line multiples.
- Stored block positions are line coordinates, not pixels.
- Window resize recalculates pixels from line coordinates.
- New content appends at the live bottom.
- Manual scrolling moves the active line and keeps a window of nearby blocks loaded.
- The custom scrollbar represents virtual line history, not DOM scroll state.
Portrait images may overlap line ranges with text next to them, but edges must still land on line boundaries.
## Markup And Tags
Canonical tag syntax:
```text
#key
#key[value]
#key[value](options)
#key:value
```
Supported story tags include:
- `#chapter[Title]`
- `#section` / `#textblock`
- `#image[file](landscape|portrait|square pause=2)`
- `#sfx[file](max=8 fade fade-duration=2)`
- `#music[file](crossfade loop lead=4)`
- `#gloss[term](definition)`
- `#tts[instruction]`
- `#tts(instruction)`
- `#tts[provider](instruction)` / `#tts-openai[instruction]`
- `#score[...]`
- `#error[...]`
- `#achievement[...]`
- `#alert[...]`
Choice tags:
- `#key:x` or `#key[x]`
- `#letter[x]`
- `#optional`
- `#action[name]`
- `#auto`, `#auto(2)`, `#auto:keyword`, `#auto:keyword(2)`
The active choice UI is one list. Explicit keys are reserved first, then remaining choices receive `1` through `0`, then `A` through `Z`.
Before key assignment, choices are ordered by invisible `#action` groups. The first appearance of each action group in the authored list determines group order. Choices inside each group are randomized for presentation. Choices without an action group form one final group shown last. Group labels are not displayed.
`#auto` marks an ordinary Ink choice that should not be rendered as a visible button. Auto choices still need a developer-facing bracket choice text, for example `[AUTO: Tunnelspiegelung]`, so the Ink remains testable in Inky. The browser selects the first ready auto choice when the choice surface becomes ready. Ink still owns availability and once-only behavior through normal choice syntax and conditions. A numeric parameter delays the trigger by UI choice turns since the last matching auto trigger. Without a keyword the delay is global; with a keyword it applies only to that keyword. Use global `#auto(n)` when different auto events must not happen back-to-back, and keyworded `#auto:name(n)` when only repeated events of the same class should be spaced out. Use the colon form for keyed auto tags on choice lines.
TTS instruction tags are paragraph/block metadata. They are ignored by renderers and by providers that do not support per-request reading instructions. Providerless `#tts[...]` and `#tts(...)` are the default authoring forms; provider-specific forms are optional filters for provider overrides. OpenAI consumes matching instructions only for `gpt-4o-mini-tts`, where they are sent as the Speech API `instructions` field. Instructions should describe delivery, such as tone, emotion, intonation, pace, accent, whispering, humming, or singing style.
Markdown emphasis:
```text
*italic* or _italic_
**bold** or __bold__
***bold italic*** or ___bold italic___
```
## Audio, TTS, And Media
TTS providers currently include `none`, Browser Speech, Kokoro, ElevenLabs, OpenAI, and local OpenAI-compatible servers. Provider modules exist, but Browser Speech and Kokoro need focused validation before being considered production-ready.
TTS cache keys include provider, voice, provider speed value, language, and exact normalized TTS string. Fast-forward must accelerate visible animation and fade/stop active TTS without cancelling background generations unless the foreground block has been waiting long enough.
Music and sound effects are preloaded when requested. Music can queue, crossfade, cut, loop, play once, and lead into following text. Music ducks by a persisted percentage during TTS playback.
## Documentation Source Of Truth
- `README.md`: usage, commands, changelog, concise feature summary.
- `SPECIFICATION.md`: architecture and behavior.
- `TODO.md`: active status and backlog.
- `MARKUP_GUIDELINES.md`: writing/authoring rules for story files.
- `THIRD_PARTY_NOTICES.md` and `public/THIRD_PARTY_NOTICES.md`: license/credits material.