Files
ai.interactive.fiction/SPECIFICATION.md
T

13 KiB

AI Interactive Fiction Specification

This is the single architecture and behavior specification for the project. Usage and changelog live in README.md; actionable work items live in TODO.md; authoring conventions live in MARKUP_GUIDELINES.md.

Product Goal

AI Interactive Fiction is a shared book-style web client plus interchangeable game engine servers. The client renders interactive fiction as animated, carefully typeset illustrated prose with optional speech, music, sound effects, images, choices, and command input. Game engines own game state and emit a shared structured protocol.

The production client must tolerate speech being unavailable. The safe TTS provider default is none; a game or player preference may select another provider.

Repository Layout

  • public/: shared browser UI, assets, fonts, client modules, third-party browser libraries.
  • src/: TypeScript servers, shared protocol types, engine implementations, YAML world model, CLI support.
  • config/engines/: per-engine configuration files.
  • data/ink-src/: Ink source files.
  • data/ink/: compiled Ink JSON output.
  • data/worlds/: YAML world files.
  • data/z-code/: Z-machine story files such as zork1.bin.
  • data/zcode-prompts/: prompt templates used by the current LLM-mediated Z-code narrator.
  • scripts/: project utility scripts. Currently used: check-node-version.js and run-engine.js.
  • templates/: not present in the current repository and not used.

Text Encoding

Ink source files and game UI localization files must be saved as UTF-8 and must contain the real written characters. German text uses full umlauts and special characters directly, for example ä, ö, ü, Ä, Ö, Ü, ß, and German quotation marks „…“. Do not transliterate German into ae, oe, ue, or ss as an encoding workaround.

Ink Authoring State

Use Ink's built-in visit state for simple facts such as "this knot has been shown". Do not create parallel boolean flags for knot visits. Use explicit LIST-based state chains only when the state has semantic order or knowledge progression, such as character-generation dependencies, relationship frames, evidence chains, or episode progress.

Choice Text Perspective

Choice text must describe the player character's intention before the action is taken. Do not write choices from a post-hoc author perspective that reveals what the branch will discover. For example, use "try the door" before the destination is known, not "go to the second-class cars"; use automatic or hidden events for things the player character cannot control, such as the train entering a tunnel.

Engine Selection And Commands

DEFAULT_GAME_ENGINE in .env selects the engine used by:

npm run dev
npm run start

Supported values are ink, yaml, and zcode.

Engine-specific commands bypass the default:

npm run dev:ink
npm run dev:yaml
npm run dev:zcode
npm run start:ink
npm run start:yaml
npm run start:zcode

dev:* runs TypeScript through ts-node and nodemon. start:* runs compiled JavaScript from dist/ and builds first through prestart:*. *:debug enables the engine's debug environment flag. *:inspect starts Node inspector and currently also enables debug for that engine.

The CLI path is YAML-only and uses src/index.ts --cli. It is useful for testing the YAML GameRunner without the browser UI. The old test-server-yaml.ts is a legacy static/YAML harness and should be removed once no workflow depends on it.

Shared Server Protocol

All engines communicate with the browser through Socket.IO and the same game API:

newGame()
loadGame(slot)
saveGame(slot)
hasSaveGame(slot)
getSaveGames()
isGameRunning()
chooseChoice(index)

The Ink engine additionally supports browser-owned session recovery:

resumeGame(savedInkState)
exportGameState()

exportGameState() returns the current Ink state without creating a server-side save slot. The client stores that state with story history, choices, input mode, and media state in IndexedDB. resumeGame(savedInkState) rehydrates a fresh server-side InkEngine after a socket reconnect or browser reload without emitting duplicate narrative. This keeps durable player-specific state client-side for hosted multi-client Ink deployments.

Line-input engines also use playerCommand for free text.

Every engine emits TurnResult objects:

interface TurnResult {
  turnId: number;
  paragraphs: Array<{ text: string; tags?: StoryTag[] }>;
  choices: ChoiceResult[];
  inputMode: 'text' | 'choice' | 'end' | 'none';
  globalTags?: StoryTag[];
  gameState?: {
    score?: number;
    endState?: { type: 'intended' | 'error'; message?: string };
  };
  suggestions?: string[];
}

The browser consumes structured TurnResult data only. YAML and Z-code servers must parse or synthesize the same tag objects that Ink exposes through native tags.

Game Engines

YAML Engine

  • Config: config/engines/yaml.json
  • Server: src/server-yaml.ts
  • World model: data/worlds/*.yml
  • CLI entry: src/index.ts --cli

The YAML engine is no longer the architectural default; it is one engine beside Ink and Z-code. It uses GameRunner, GameEngine, and YamlWorldParser, emits inputMode: 'text', and remains the best test bed for deterministic world-model plus LLM command interpretation.

Ink Engine

  • Config: config/engines/ink.json
  • Server: src/server-ink.ts
  • Engine: src/engine/ink-engine.ts
  • Source: data/ink-src/eibenreith/main.ink plus included chapter files.
  • Compiled output: data/ink/eibenreith.ink.json

The Ink server compiles source at startup using inkjs/full, then runs the compiled story with inkjs. Ink choices become ChoiceResult objects. Ink tags become shared StoryTag objects. Choice preview tags support #key, #letter, #optional, #action, #gated, #sort, and #auto.

The server keeps only ephemeral per-socket InkEngine instances. Browser IndexedDB owns durable Ink saves and the current autosave. If the socket reconnects or the page reloads, the browser sends the autosaved Ink state to resumeGame() and restores rendered history locally.

Ink does not provide arbitrary string input as a native async primitive comparable to choices. Future text-input turns should be implemented through a tag such as #input[name](prompt): the server returns inputMode: 'text', the UI shows command input for one round, then the server stores the submitted string into an Ink variable and continues.

Z-code Engine

  • Config: config/engines/zcode.json
  • Server: src/server-zcode.ts
  • Engine: src/engine/zcode-llm-engine.ts
  • Story file: data/z-code/zork1.bin by default.
  • Prompt templates: data/zcode-prompts/*.yml

The engine name is Z-code. Zork I is only the current game file and prompt target. The current implementation runs a Z-machine story through ifvms, keeps Z-machine state authoritative, and uses an LLM to translate natural-language input into parser commands and rewrite raw Z-machine output into prose.

Future work should separate Z-code-generic logic from Zork-specific prompt content more clearly.

Client Module System

The browser client uses native ES modules, no bundler. The loader imports modules, analyzes dependency declarations, initializes modules in dependency order, tracks state/progress, and hides the loading overlay only when initialization and progress exit animations are complete.

Rules:

  • Every app module extends BaseModule.
  • Every app module registers with moduleRegistry.
  • Required dependencies must be listed in dependencies.
  • Modules should use authoritative dependencies instead of local fallbacks.
  • Do not add fallback paths to hide bad dependency declarations or ordering bugs.
  • setTimeout must not paper over initialization races. It is acceptable for animation, debounce, throttle, and browser rendering timing when locally justified.

Core modules:

  • loader.js: module script loading, progress UI, dependency diagnostics.
  • module-registry.js: registration and readiness promises.
  • base-module.js: lifecycle, progress, state, event cleanup.

Primary client responsibilities:

  • Text and typography: text-processor, paragraph-layout, layout-renderer.
  • Markup: markup-parser.
  • Queue/playback: text-buffer, sentence-queue, playback-coordinator, animation-queue.
  • Audio/TTS: audio-manager, tts-factory, provider modules.
  • UI: ui-controller, ui-display-handler, ui-input-handler, choice-display, options-ui, ui-effects.
  • Persistence/history: persistence-manager, story-history.
  • Networking: socket-client.

Known cleanup candidates: debug-utils-module.js is not loaded; game-loop-module.js still contains high-level glue from older architecture and should be audited before removal.

Text Pipeline

Processing order:

  1. Receive structured blocks and tags from a game engine.
  2. Parse inline story markup and remove media markers from display/TTS text.
  3. Apply Markdown emphasis.
  4. Apply locale-aware SmartyPants typography.
  5. Apply Hyphenopoly for the game metadata language.
  6. Measure text using the exact page font settings.
  7. Run Knuth-Plass line breaking.
  8. Render absolutely positioned words into the page line-coordinate model.
  9. Animate words in sync with measured TTS duration or estimated duration.

The external Knuth-Plass library should not be locally modified. Adaptation belongs in our modules.

Right Page Layout And History

The right page is a virtual line-addressed content pane:

  • #page_right does not use native scrolling.
  • Page height is divided into PAGE_LINE_COUNT = 25.
  • All block heights, margins, image spacing, and chapter/section spacing are exact line multiples.
  • Stored block positions are line coordinates, not pixels.
  • Window resize recalculates pixels from line coordinates.
  • New content appends at the live bottom.
  • Manual scrolling moves the active line and keeps a window of nearby blocks loaded.
  • The custom scrollbar represents virtual line history, not DOM scroll state.

Portrait images may overlap line ranges with text next to them, but edges must still land on line boundaries.

Markup And Tags

Canonical tag syntax:

#key
#key[value]
#key[value](options)
#key:value

Supported story tags include:

  • #chapter[Title]
  • #section / #textblock
  • #image[file](landscape|portrait|square pause=2)
  • #sfx[file](max=8 fade fade-duration=2)
  • #music[file](crossfade loop lead=4)
  • #gloss[term](definition)
  • #tts[instruction]
  • #tts(instruction)
  • #tts[provider](instruction) / #tts-openai[instruction]
  • #score[...]
  • #error[...]
  • #achievement[...]
  • #alert[...]

Choice tags:

  • #key:x or #key[x]
  • #letter[x]
  • #optional
  • #action[name]
  • #auto, #auto(2), #auto[keyword], #auto[keyword](2)

The active choice UI is one list. Explicit keys are reserved first, then remaining choices receive 1 through 0, then A through Z. Before key assignment, choices are ordered by invisible #action groups. The first appearance of each action group in the authored list determines group order. Choices inside each group are randomized for presentation. Choices without an action group form one final group shown last. Group labels are not displayed.

#auto marks an ordinary Ink choice that should not be rendered as a visible button. The browser selects the first ready auto choice when the choice surface becomes ready. Ink still owns availability and once-only behavior through normal choice syntax and conditions. A numeric parameter delays the trigger by UI choice turns since the last matching auto trigger. Without a keyword the delay is global; with a keyword it applies only to that keyword. Use global #auto(n) when different auto events must not happen back-to-back, and keyworded #auto[name](n) when only repeated events of the same class should be spaced out.

TTS instruction tags are paragraph/block metadata. They are ignored by renderers and by providers that do not support per-request reading instructions. Providerless #tts[...] and #tts(...) are the default authoring forms; provider-specific forms are optional filters for provider overrides. OpenAI consumes matching instructions only for gpt-4o-mini-tts, where they are sent as the Speech API instructions field. Instructions should describe delivery, such as tone, emotion, intonation, pace, accent, whispering, humming, or singing style.

Markdown emphasis:

*italic* or _italic_
**bold** or __bold__
***bold italic*** or ___bold italic___

Audio, TTS, And Media

TTS providers currently include none, Browser Speech, Kokoro, ElevenLabs, OpenAI, and local OpenAI-compatible servers. Provider modules exist, but Browser Speech and Kokoro need focused validation before being considered production-ready.

TTS cache keys include provider, voice, provider speed value, language, and exact normalized TTS string. Fast-forward must accelerate visible animation and fade/stop active TTS without cancelling background generations unless the foreground block has been waiting long enough.

Music and sound effects are preloaded when requested. Music can queue, crossfade, cut, loop, play once, and lead into following text. Music ducks by a persisted percentage during TTS playback.

Documentation Source Of Truth

  • README.md: usage, commands, changelog, concise feature summary.
  • SPECIFICATION.md: architecture and behavior.
  • TODO.md: active status and backlog.
  • MARKUP_GUIDELINES.md: writing/authoring rules for story files.
  • THIRD_PARTY_NOTICES.md and public/THIRD_PARTY_NOTICES.md: license/credits material.