# Client Specification And Progress Report This file is the single living technical specification, implementation checklist, and progress report for the web client. Usage instructions and changelog live in `README.md`. ## Product Goal Build an AI-assisted interactive fiction client that feels like a carefully typeset illustrated novel rather than a chat window. The game server owns game state and narrative generation. The client renders incoming narrative as synchronized animated prose with optional speech, sound effects, music, image blocks, and persistent player options. The production client must tolerate TTS being unavailable. The safe default TTS provider is `none`; a game, user preference, or explicit option can select another provider. ## Current Status - Done: native ES module loader with dependency graph, module states, progress overlay, cache-busted development loading, and ordered async initialization. - Done: responsive book layout that scales page, font sizes, and word positions relative to page size. - Done: story parser/protocol bridge for Ink-style `#` tags, chapters, text blocks, Markdown emphasis, image blocks, sound effect cues, and music cues. - Done: SmartyPants punctuation, language-aware Hyphenopoly integration, and Knuth-Plass paragraph line breaking. - Done: paragraph rules for normal paragraphs, chapter-first paragraphs, textblock-first paragraphs, drop caps, and first-line indentation. - Done: sentence queue and playback coordinator for preparing text and TTS before synchronized playback. - Done: TTS providers for none, browser speech synthesis, Kokoro, ElevenLabs, and OpenAI, with status reporting in options. - Done: TTS cache keyed by request parameters rather than text alone. - Done: persisted speech enable state, provider, voice, speed, language, and volume preferences. - Done: top-bar and options controls for speech and speed synchronization after recent fixes. - Done: command input focus behavior and global typing redirection into the command input while a game is running. - Done: fast-forward by page click or space, including animation completion and TTS fade/stop. - Done: mouse cursor state reporting by process state. - Done: placeholder game API for new/load/save/running state. - Done: sound effect and music folders, sound effect playback, music playback, and music ducking during TTS. - Done: image markup is parsed, persisted in history, restored from save/history, and rendered as line-snapped page blocks. - Partial: save-game API restores story state and Ink state, but the broader save/storage model still needs hardening for all engines. - Pending: deeper automated tests for layout, playback timing, TTS provider switching, and media cue timing. ## Module System Specification The client uses native browser ES modules. No bundler is required for the web client modules in `public/js/`. Required module rules: - Every app module extends `BaseModule`. - Every app module registers with `moduleRegistry`. - Every app module declares all required dependencies in its `dependencies` list. - The loader loads module scripts, resolves the dependency graph, initializes modules in dependency order, awaits async initialization, and only then hides the loading overlay. - Modules must rely on the loader for dependency readiness. Do not add fallback paths for missing dependencies inside modules. - Do not add fallback code that bypasses an authoritative module, service, parser, state store, or API to hide an architectural problem. If such a fallback already exists or seems tempting, stop and report the architectural mismatch before changing code. - Module states are `PENDING`, `LOADING`, `WAITING`, `INITIALIZING`, `FINISHED`, and `ERROR`. - Modules must report real state transitions. A module must not report `FINISHED` until its critical initialization is actually ready. - `setTimeout` must not be used to paper over dependency or async ordering bugs. It is acceptable inside isolated scheduling systems such as animation timing, debounce, throttle, or browser rendering workarounds when documented by context. Core loader components: - `loader.js`: dynamic import orchestration, dependency order, loading UI, cache-busted module URLs in development. - `module-registry.js`: module registration, dependency metadata, readiness promises. - `base-module.js`: shared lifecycle, state changes, event listener tracking, progress reporting. The loader is deliberately the conductor, not the orchestra. Module-specific configuration, resource loading, and progress detail belong inside the module that owns the work. ## Current Module Responsibilities - `markup-parser-module.js`: converts story text into text blocks, inline styled spans, image blocks, sound cues, and music cues. - `text-processor-module.js`: applies SmartyPants and Hyphenopoly according to active language. - `paragraph-layout-module.js`: measures text and computes Knuth-Plass layout. - `layout-renderer-module.js`: turns line-coordinate layout data into absolutely positioned page DOM with stable word positions and animation metadata. - `sentence-queue-module.js`: prepares speech/media readiness. It must not own page layout, image wrapping, or history rendering state. - `playback-coordinator-module.js`: starts synchronized text/audio playback in the right order. - `animation-queue-module.js`: schedules and fast-forwards visual text animation. - `audio-manager-module.js`: owns sound effects, music tracks, music ducking, volume application, and speech audio playback helpers. - `tts-factory-module.js`: selects provider, applies preferences, generates/preloads speech, caches speech data, and exposes unified TTS operations. - `tts-handler-module.js`: common TTS handler base. - `browser-tts-module.js`: Web Speech API provider. - `kokoro-tts-module.js`: Kokoro provider and loading bridge. - `elevenlabs-tts-module.js`: ElevenLabs provider. - `openai-tts-module.js`: OpenAI speech provider with fixed supported voices. - `persistence-manager-module.js`: browser preferences and durable client state. - `localization-module.js`: language state used by UI, hyphenation, and TTS selection. - `options-ui-module.js`: options modal, persisted controls, provider status displays. - `ui-controller-module.js`: top-bar commands, global input behavior, game API control wiring. - `ui-display-handler-module.js`: book page display, startup prompt, unified live/history rendering, line-coordinate scrolling, image placement, and media block dispatch. - `ui-input-handler-module.js`: command entry, history, fast-forward key handling. - `socket-client-module.js`: socket connection and game API request wrapper. - `game-loop-module.js`: high-level client/game flow. ## Text Layout Specification The right page must look like typeset book text: - Paragraphs are laid out by paragraph, not as one continuous text run. - Normal following paragraphs have a first-line indent. - There is no blank line between ordinary paragraphs. - A chapter marker creates a centered italic heading and makes the first following paragraph special. - A textblock/section marker creates one line of vertical separation and makes the first following paragraph special. - Special first paragraphs after chapter or textblock markers have no horizontal first-line indent. - Chapter-first paragraphs use a drop cap aligned to an exact multiple of the body line height. Current target is a two-line drop cap unless visual testing justifies three lines. - Lines are justified so line starts and line ends touch the intended measure. Final lines are not force-justified. - Hyphenation must be real language-aware hyphenation from Hyphenopoly, not a fallback-only emergency split. - Line breaking uses the Knuth-Plass algorithm over the full paragraph. - Punctuation and short marks should not visually break the measure; optical margin handling is desirable future polish. - The page must scale as a fixed-aspect book page. Font sizes and word positions scale with page size, preserving the composition when the window is resized. ## Right Page History And Scrolling Specification The right page uses one virtual, line-addressed content pane. It must not behave like browser pagination and must not rely on native scrolling inside `#page_right`, `#story`, or story blocks. Line model invariants: - `#page_right` has a size relative to the browser window. - There is exactly one story line-height value. - The page height is divided into a fixed number of lines; currently `PAGE_LINE_COUNT = 25`. - `lineHeight = pageRightHeight / PAGE_LINE_COUNT`. - All rendered content has a height that is an exact multiple of line height, including margins, internal spacing, drop cap space, image vertical spacing, and section/chapter spacing. - All virtual content coordinates and pixel positions are derived mathematically from line coordinates. - Stored content does not change line numbers after creation. - Visible content is never inserted between already existing blocks; new live content is appended at the end of the virtual history. - Therefore cumulative pixel measurements from the browser DOM are not authoritative and cumulative line starts should not need updating after a block has been assigned coordinates. - In portrait-image cases, text and image blocks may occupy overlapping cumulative line ranges, but every block edge still lands on a line boundary. Scroll positioning: - Scrolling means translating the content pane vertically with an ease-in/ease-out animation. - Every finished scroll position must snap to the nearest position where page edges align with line edges. - Scrolling to the top means the top edge of the first line of the first block aligns with the top edge of the page. - Scrolling to the bottom means the bottom edge of the last line of the last rendered block aligns with the bottom edge of the page. - Scrolling to the bottom to insert new content uses the same bottom rule, but the new block is first added invisibly to block history, advancing the block counter and line history. The page scrolls to the resulting bottom position, then the block reveal animation starts. - If playback continues while the user is viewing older history, the view must first return to the live bottom insertion position before revealing new content. - If manual scrolling moves currently animating content out of focus, active text animation and TTS playback must be fast-forwarded through the same path used by page click/space, including TTS fade/stop. Active line and active block model: - The 41-block retention target is not pagination. - There is one active line representing the current view position. - If enough content exists above it, the active line is considered to be the last visible line of the page, line 25. - The block containing that active line is the active block. - The DOM should normally contain 20 blocks before the active block, the active block itself, and 20 blocks after the active block, when those blocks exist. - When normal scrolling shifts the active line into a different block, load one block in the direction of travel and unload one block from the opposite side. - This one-block exchange should happen as soon as the active block changes, not after the viewport reaches a DOM edge. - Mouse wheel, arrow key, and scrollbar interactions must drive this active-line model rather than loading page-sized chunks. Random-position and scrollbar jumps: - Scrolling to a random target first identifies the target line and target block. - If the target is reached by traversal, the one-block exchange model applies. - If the target is jumped to, the page first loads: - 20 blocks before the current/starting active block, as available, - the current/starting active block, - all blocks between the starting block and the target block, - the target block, - 20 blocks after the target block, as available. - The whole loaded range can then be traversed smoothly from the starting position to the target position. - The final target aligns so the bottom edge of the requested line aligns with the bottom edge of the page when enough content exists above it; otherwise it uses the top rule. - After the scroll finishes, blocks farther than the retained margin are unloaded. - If the required loaded range would exceed a sensible DOM budget, currently 150 blocks total, all visible page content fades out, the old DOM content is unloaded, the target block plus 20 blocks before and after it are loaded, and the page fades in at the target position. Scrollbar behavior: - The custom scrollbar represents virtual history position and history size in line coordinates, not native DOM scroll state. - Dragging the scrollbar thumb should move the thumb preview freely without scrolling content or loading history during the drag. - On pointer release, the target line/block is resolved, the required block range is loaded according to the random-position rules, and then the content scroll animation runs. - Scrollbar pointer events must not bubble into story fast-forward/continue handlers. Processing order: 1. Parse block and inline story markup. 2. Remove media markers from display and TTS text while keeping cue positions. 3. Convert Markdown emphasis to inline style spans. 4. Apply SmartyPants typographic punctuation. 5. Apply Hyphenopoly for the active language. 6. Measure words/spans. 7. Run Knuth-Plass line breaking. 8. Render stable positioned spans. 9. Animate spans in sync with audio duration or estimated duration. ## Prototype Lessons To Preserve The prototype in `prototype/` and the former `PROTOTYPE_ANALYSIS.md` proved the target text pipeline. Any future layout refactor should preserve these details: - Hyphenopoly must emit pipe markers for hyphenation points through the `.hyphenatePipe` configuration. - The layout measurement function must treat the pipe marker as zero width. - `knuth-and-plass.js` must split text on spaces, punctuation separators, HTML tags, and pipe markers. - Pipe markers become penalty nodes, not visible text. - A visible hyphen is emitted only when the chosen line break uses a hyphenation penalty. - Text measurement should use the same real CSS font, size, and style that the book page uses. - Justification is applied through the Knuth-Plass break ratio by stretching or shrinking glue nodes. - Hyphenated syllables after a penalty must be rendered as one visual word group, preserving the prototype's no-overlap behavior. - The prototype used percentage-based positions so rendered text stayed proportional when the page scaled. Known inherited implementation note: the old `linked-list.js` analysis identified `get last()` returning `this.last` instead of `this.tail`, which can recurse if used. The current implementation should be checked and corrected before future line-breaking refactors rely on `last`. ## Story Markup Specification Canonical structural and media tags use Ink-style `#` syntax: - Ink engines write native Ink tags such as `# chapter[Title]`, `# image[file.png](landscape)`, or `# music[track.ogg](crossfade loop)`. - inkjs exposes those tags without the leading `#`; the server parses them into `StoryTag` objects. - YAML and Zork narrative output use the same leading `#...` syntax, parsed by the server into `StoryTag` objects before the client sees them. - The browser protocol is structured `TurnResult` objects with structured tags and render blocks, not raw story markup. Markdown emphasis: ```text *italic* or _italic_ **bold** or __bold__ ***bold italic*** or ___bold italic___ ``` Chapter: ```text #chapter[The Mysterious Mansion] The first paragraph has a drop cap and no first-line indent. ``` The heading is centered, italic, and uses the body font size. Following ordinary paragraphs return to normal first-line indentation. Section or text block: ```text #section The first paragraph is vertically separated from previous content and has no first-line indent. ``` `#textblock` is an alias. Following ordinary paragraphs return to normal indentation. Images: ```text #image[file-name.jpg](landscape) #image[file-name.jpg](portrait pause=2) #image[file-name.jpg](square delay=1.5) ``` File names resolve relative to `public/images/`. `widescreen` is still accepted as an alias for `landscape`. Landscape and square images are centered and rendered near full page width with heights snapped to whole text lines. Portrait images float at half page width and following prose is narrowed for the number of lines the image covers. Image pauses accept the same timing style as music (`pause=2`, `delay=2`, `lead=2`, or `2s`); pauses are skippable with click/space and do not prevent the next TTS item from being prepared in the background. Sound effects: ```text #sfx[squeaky-door.ogg] The old door opens into the dark. ``` File names resolve relative to `public/sounds/`. The server parses the tag into a `StoryTag`; the tag is not displayed and is not sent to TTS. Music: ```text #music[track.ogg](crossfade, loop, lead=4) ``` File names resolve relative to `public/music/`. Modes: - `queue`: wait until the current track ends. - `crossfade`: fade from the current track into the new track. - `cut`: stop the current track and start the new one immediately. - `loop`: repeat the track. - `once`: do not repeat the track. - `lead=`: for block music, let music play alone before the following text/TTS paragraph starts. For chapter openings, authors can place `#music[file](..., lead=N)` after `#chapter[...]` and before the first prose paragraph. The heading is rendered/spoken first, then music starts and plays alone for the lead duration, then the dropcapped paragraph continues. Music and image pauses are playback gates only; they must not stop the queue from preparing upcoming TTS in the background. ## TTS And Playback Specification The playback system must keep text animation and audio synchronized. - Complete sentences enter a preparation queue. - TTS generation/preload starts as soon as possible, including while previous prepared sentences are playing. - Text layout and TTS generation can be prepared in parallel. - Playback of a sentence starts only when the required audio duration is known, or when TTS is disabled/unavailable and an estimated duration has been calculated. - With measured TTS audio, animation duration follows the measured audio length. - With TTS `none` or unavailable providers, duration is estimated from text length and speed. - The speed slider is persisted and has normal speed at the center. - Provider-specific speed ranges must be converted from the app-level speed value: - OpenAI speech speed uses `1.0` as normal and supports a bounded multiplier. - ElevenLabs speed must be clamped to its accepted range. - Browser speech synthesis uses utterance rate. - Kokoro uses its provider-supported speed option. - `none` uses the same app-level speed to scale estimated animation duration. - Switching provider, voice, language, or speed during gameplay should apply to the next not-yet-generated sentence. - Kokoro is special: loading is expensive, so it should be loaded on startup only when selected and may require reload to switch into it. - Fast-forward completes visible animation and fades/stops active TTS playback so the next sentence can start earlier. - Music ducks to 70% of its configured volume while TTS playback is active, then fades back when the TTS playback queue is empty. TTS cache keys must include: - provider - voice - provider speed value - language - exact input string after markup/media removal When all cache parameters match, cached audio should be played instead of regenerated. OpenAI voices for the current speech endpoint are: - `nova` - `shimmer` - `echo` - `onyx` - `fable` - `alloy` - `ash` - `sage` - `coral` ## Cursor And Interaction States The mouse cursor, not the text insertion caret, indicates process state. The command input caret keeps its normal text-editing appearance. Required states: - ready for new command - command sent, waiting for game server answer - waiting silently for required TTS generation before animation can continue - audio and animation playing while another sentence is being generated - audio and animation playing while no further generation is needed State changes should also be logged to the console during development. Typing behavior: - While a game is running, printable keyboard input should go to the command input regardless of the clicked element. - Enter sends the command when input is non-empty. - Space still inserts a space in the input, but if playback is active it also fast-forwards current playback. - Clicking on the book while playback is active fast-forwards current playback. ## Game API Specification The client/server game API supports: ```text newGame() loadGame(slot) saveGame(slot) hasSaveGame(slot) getSaveGames() isGameRunning() ``` `slot` is a positive integer from `1` to the number of save slots exposed by the UI. Current placeholder behavior: - `newGame()` starts the demo game and emits the introduction/current room description. - `saveGame(slot)` records a placeholder save for the current socket session. - `hasSaveGame(slot)` checks that session-local placeholder save. - `getSaveGames()` returns the saved slot numbers for the session. - `loadGame(slot)` requires a placeholder save and then starts the demo game like `newGame()`. - Saves do not persist across reloads yet. - `isGameRunning()` returns true after a game starts and until the session ends. UI requirements before a game starts: - Command input is hidden. - Right page shows the startup prompt. - `new game` is enabled. - `load` is enabled only when `hasSaveGame(slot)` is true. - `save` is disabled. UI requirements after a game starts: - Command input is visible and focused. - `save` is enabled. - `load` reflects save availability. - `new game` remains enabled and restarts the game. ## Server And World Model The TypeScript server serves the web client and owns socket communication. The CLI and web modes use `GameRunner` and a YAML world file. Current development world: - Default file: `./data/worlds/example_world.yml` - Server-side command handling is currently mirrored for UI testing: entered commands are sent to the server and returned as narrative text through the socket response path. Longer-term goal: - Keep a deterministic world model for rooms, objects, actions, state changes, and validation. - Use the LLM to translate natural language player intent into game actions. - Use the LLM to render state changes as prose without allowing hallucinated state. ## Quality Rules - Preserve existing module boundaries when changing code. - Prefer event-driven and Promise-based coordination over timing hacks. - Do not use loader fallbacks to hide bad dependency declarations. - Keep the book page visually stable under resizing. - Validate typography visually after layout changes. - Validate TTS timing with real provider playback and with TTS `none`. - Treat the ad-blocker console error `onpage-dialog.preload.js:121 Uncaught ReferenceError: browser is not defined` as unrelated noise. ## Progress Checklist ### Completed - [x] Split monolithic prototype concepts into focused client modules. - [x] Added loader, registry, base module, states, dependency declarations, and ordered initialization. - [x] Added development cache busting and no-cache static file serving. - [x] Added socket-backed game API wrapper. - [x] Added manual game start flow for browser audio policies. - [x] Added SmartyPants support. - [x] Added Hyphenopoly support. - [x] Added Knuth-Plass paragraph layout. - [x] Fixed overfull words, incorrect spacing, and hyphenation integration regressions from the prototype migration. - [x] Added chapter heading and dropcap markup. - [x] Added section/textblock markup. - [x] Added Markdown emphasis parsing. - [x] Added image markup parsing, line-snapped rendering, and history/save restoration. - [x] Added sound effect markup and playback. - [x] Added music markup, playback modes, loop/once, and lead-in. - [x] Added music ducking during TTS. - [x] Added TTS `none` mode. - [x] Added OpenAI TTS support and restricted OpenAI voice list. - [x] Added ElevenLabs speed clamping. - [x] Added persisted speed and speech state. - [x] Added speech toggle synchronization between top bar and options. - [x] Added volume controls for speech, music, and sound effects. - [x] Added command input focus and global typing behavior. - [x] Added command mirroring through the server path for UI testing. - [x] Added fast-forward for animation and TTS fade/stop. ### In Progress - [ ] Keep validating provider-specific speed conversion for all TTS providers against real API behavior. - [ ] Tighten automated checks around top-bar/options state initialization after reload. - [ ] Improve automated visual regression coverage for page scaling, dropcap line-height alignment, and paragraph indentation. - [ ] Improve automated audio tests for music ducking, sound effect timing, and fast-forward fadeout. ### Pending - [x] Implement image rendering for `#image[file](landscape)`, `#image[file](portrait)`, and `#image[file](square)`. - [ ] Replace placeholder save implementation with durable save files or server-side save storage. - [ ] Replace command mirroring with the full LLM/world-model command loop when typography/audio testing no longer needs mirroring. - [ ] Add optical margin alignment or punctuation protrusion support for line endings. - [ ] Add more provider readiness tests and richer diagnostics in options. - [ ] Add unit tests for `SentenceQueue`, markup parsing, TTS cache key generation, and game API methods. - [ ] Add browser integration tests for module loading, new-game startup, command input behavior, and playback controls. ## Consolidation Notes The durable content from the former root and `references/` documents has been merged here or into `README.md`. The old reference files should remain untouched until the user approves cleanup, but they are no longer intended as the source of truth.