Files
ai.interactive.fiction/CLIENT_TODO.md
T

368 lines
19 KiB
Markdown

# Client Specification And Progress Report
This file is the single living technical specification, implementation checklist, and progress report for the web client. Usage instructions and changelog live in `README.md`.
## Product Goal
Build an AI-assisted interactive fiction client that feels like a carefully typeset illustrated novel rather than a chat window. The game server owns game state and narrative generation. The client renders incoming narrative as synchronized animated prose with optional speech, sound effects, music, future image blocks, and persistent player options.
The production client must tolerate TTS being unavailable. The safe default TTS provider is `none`; a game, user preference, or explicit option can select another provider.
## Current Status
- Done: native ES module loader with dependency graph, module states, progress overlay, cache-busted development loading, and ordered async initialization.
- Done: responsive book layout that scales page, font sizes, and word positions relative to page size.
- Done: story parser for chapters, text blocks, Markdown emphasis, image blocks, sound effect cues, and music cues.
- Done: SmartyPants punctuation, language-aware Hyphenopoly integration, and Knuth-Plass paragraph line breaking.
- Done: paragraph rules for normal paragraphs, chapter-first paragraphs, textblock-first paragraphs, drop caps, and first-line indentation.
- Done: sentence queue and playback coordinator for preparing text and TTS before synchronized playback.
- Done: TTS providers for none, browser speech synthesis, Kokoro, ElevenLabs, and OpenAI, with status reporting in options.
- Done: TTS cache keyed by request parameters rather than text alone.
- Done: persisted speech enable state, provider, voice, speed, language, and volume preferences.
- Done: top-bar and options controls for speech and speed synchronization after recent fixes.
- Done: command input focus behavior and global typing redirection into the command input while a game is running.
- Done: fast-forward by page click or space, including animation completion and TTS fade/stop.
- Done: mouse cursor state reporting by process state.
- Done: placeholder game API for new/load/save/running state.
- Done: sound effect and music folders, sound effect playback, music playback, and music ducking during TTS.
- Partial: image markup is parsed and queued, but actual image rendering is still future work.
- Partial: save-game API is a placeholder; saves are per socket session and do not survive reload.
- Pending: deeper automated tests for layout, playback timing, TTS provider switching, and media cue timing.
## Module System Specification
The client uses native browser ES modules. No bundler is required for the web client modules in `public/js/`.
Required module rules:
- Every app module extends `BaseModule`.
- Every app module registers with `moduleRegistry`.
- Every app module declares all required dependencies in its `dependencies` list.
- The loader loads module scripts, resolves the dependency graph, initializes modules in dependency order, awaits async initialization, and only then hides the loading overlay.
- Modules must rely on the loader for dependency readiness. Do not add fallback paths for missing dependencies inside modules.
- Module states are `PENDING`, `LOADING`, `WAITING`, `INITIALIZING`, `FINISHED`, and `ERROR`.
- Modules must report real state transitions. A module must not report `FINISHED` until its critical initialization is actually ready.
- `setTimeout` must not be used to paper over dependency or async ordering bugs. It is acceptable inside isolated scheduling systems such as animation timing, debounce, throttle, or browser rendering workarounds when documented by context.
Core loader components:
- `loader.js`: dynamic import orchestration, dependency order, loading UI, cache-busted module URLs in development.
- `module-registry.js`: module registration, dependency metadata, readiness promises.
- `base-module.js`: shared lifecycle, state changes, event listener tracking, progress reporting.
The loader is deliberately the conductor, not the orchestra. Module-specific configuration, resource loading, and progress detail belong inside the module that owns the work.
## Current Module Responsibilities
- `markup-parser-module.js`: converts story text into text blocks, inline styled spans, image blocks, sound cues, and music cues.
- `text-processor-module.js`: applies SmartyPants and Hyphenopoly according to active language.
- `paragraph-layout-module.js`: measures text and computes Knuth-Plass layout.
- `layout-renderer-module.js`: turns layout data into page DOM with stable word positions and animation metadata.
- `sentence-queue-module.js`: prepares sentence objects and coordinates layout/TTS readiness.
- `playback-coordinator-module.js`: starts synchronized text/audio playback in the right order.
- `animation-queue-module.js`: schedules and fast-forwards visual text animation.
- `audio-manager-module.js`: owns sound effects, music tracks, music ducking, volume application, and speech audio playback helpers.
- `tts-factory-module.js`: selects provider, applies preferences, generates/preloads speech, caches speech data, and exposes unified TTS operations.
- `tts-handler-module.js`: common TTS handler base.
- `browser-tts-module.js`: Web Speech API provider.
- `kokoro-tts-module.js`: Kokoro provider and loading bridge.
- `elevenlabs-tts-module.js`: ElevenLabs provider.
- `openai-tts-module.js`: OpenAI speech provider with fixed supported voices.
- `persistence-manager-module.js`: browser preferences and durable client state.
- `localization-module.js`: language state used by UI, hyphenation, and TTS selection.
- `options-ui-module.js`: options modal, persisted controls, provider status displays.
- `ui-controller-module.js`: top-bar commands, global input behavior, game API control wiring.
- `ui-display-handler-module.js`: book page display, startup prompt, text insertion, and media block dispatch.
- `ui-input-handler-module.js`: command entry, history, fast-forward key handling.
- `socket-client-module.js`: socket connection and game API request wrapper.
- `game-loop-module.js`: high-level client/game flow.
## Text Layout Specification
The right page must look like typeset book text:
- Paragraphs are laid out by paragraph, not as one continuous text run.
- Normal following paragraphs have a first-line indent.
- There is no blank line between ordinary paragraphs.
- A chapter marker creates a centered italic heading and makes the first following paragraph special.
- A textblock/section marker creates one line of vertical separation and makes the first following paragraph special.
- Special first paragraphs after chapter or textblock markers have no horizontal first-line indent.
- Chapter-first paragraphs use a drop cap aligned to an exact multiple of the body line height. Current target is a two-line drop cap unless visual testing justifies three lines.
- Lines are justified so line starts and line ends touch the intended measure. Final lines are not force-justified.
- Hyphenation must be real language-aware hyphenation from Hyphenopoly, not a fallback-only emergency split.
- Line breaking uses the Knuth-Plass algorithm over the full paragraph.
- Punctuation and short marks should not visually break the measure; optical margin handling is desirable future polish.
- The page must scale as a fixed-aspect book page. Font sizes and word positions scale with page size, preserving the composition when the window is resized.
Processing order:
1. Parse block and inline story markup.
2. Remove media markers from display and TTS text while keeping cue positions.
3. Convert Markdown emphasis to inline style spans.
4. Apply SmartyPants typographic punctuation.
5. Apply Hyphenopoly for the active language.
6. Measure words/spans.
7. Run Knuth-Plass line breaking.
8. Render stable positioned spans.
9. Animate spans in sync with audio duration or estimated duration.
## Prototype Lessons To Preserve
The prototype in `prototype/` and the former `PROTOTYPE_ANALYSIS.md` proved the target text pipeline. Any future layout refactor should preserve these details:
- Hyphenopoly must emit pipe markers for hyphenation points through the `.hyphenatePipe` configuration.
- The layout measurement function must treat the pipe marker as zero width.
- `knuth-and-plass.js` must split text on spaces, punctuation separators, HTML tags, and pipe markers.
- Pipe markers become penalty nodes, not visible text.
- A visible hyphen is emitted only when the chosen line break uses a hyphenation penalty.
- Text measurement should use the same real CSS font, size, and style that the book page uses.
- Justification is applied through the Knuth-Plass break ratio by stretching or shrinking glue nodes.
- Hyphenated syllables after a penalty must be rendered as one visual word group, preserving the prototype's no-overlap behavior.
- The prototype used percentage-based positions so rendered text stayed proportional when the page scaled.
Known inherited implementation note: the old `linked-list.js` analysis identified `get last()` returning `this.last` instead of `this.tail`, which can recurse if used. The current implementation should be checked and corrected before future line-breaking refactors rely on `last`.
## Story Markup Specification
Markdown emphasis:
```text
*italic* or _italic_
**bold** or __bold__
***bold italic*** or ___bold italic___
```
Chapter:
```text
::chapter[The Mysterious Mansion]
The first paragraph has a drop cap and no first-line indent.
```
The heading is centered, italic, and uses the body font size. Following ordinary paragraphs return to normal first-line indentation.
Section or text block:
```text
::section
The first paragraph is vertically separated from previous content and has no first-line indent.
```
`::textblock` is an alias. Following ordinary paragraphs return to normal indentation.
Images:
```text
::image[widescreen](file-name.jpg)
::image[portrait](file-name.jpg)
```
File names resolve relative to `public/images/`. `widescreen` means full page width and half page height. `portrait` means full page width and full page height. Parsing is implemented; visual rendering is pending.
Sound effects:
```text
The old door opens {{sfx:squeaky-door.ogg}} into the dark.
```
File names resolve relative to `public/sounds/`. The marker is not displayed and is not sent to TTS. Playback starts when animation reaches the marker position.
Music:
```text
::music[crossfade, loop, lead=4](track.ogg)
{{music:cut:track.ogg}}
```
File names resolve relative to `public/music/`. Modes:
- `queue`: wait until the current track ends.
- `crossfade`: fade from the current track into the new track.
- `cut`: stop the current track and start the new one immediately.
- `loop`: repeat the track.
- `once`: do not repeat the track.
- `lead=<seconds>`: for block music, let music play alone before the following text/TTS paragraph starts.
## TTS And Playback Specification
The playback system must keep text animation and audio synchronized.
- Complete sentences enter a preparation queue.
- TTS generation/preload starts as soon as possible, including while previous prepared sentences are playing.
- Text layout and TTS generation can be prepared in parallel.
- Playback of a sentence starts only when the required audio duration is known, or when TTS is disabled/unavailable and an estimated duration has been calculated.
- With measured TTS audio, animation duration follows the measured audio length.
- With TTS `none` or unavailable providers, duration is estimated from text length and speed.
- The speed slider is persisted and has normal speed at the center.
- Provider-specific speed ranges must be converted from the app-level speed value:
- OpenAI speech speed uses `1.0` as normal and supports a bounded multiplier.
- ElevenLabs speed must be clamped to its accepted range.
- Browser speech synthesis uses utterance rate.
- Kokoro uses its provider-supported speed option.
- `none` uses the same app-level speed to scale estimated animation duration.
- Switching provider, voice, language, or speed during gameplay should apply to the next not-yet-generated sentence.
- Kokoro is special: loading is expensive, so it should be loaded on startup only when selected and may require reload to switch into it.
- Fast-forward completes visible animation and fades/stops active TTS playback so the next sentence can start earlier.
- Music ducks to 70% of its configured volume while TTS playback is active, then fades back when the TTS playback queue is empty.
TTS cache keys must include:
- provider
- voice
- provider speed value
- language
- exact input string after markup/media removal
When all cache parameters match, cached audio should be played instead of regenerated.
OpenAI voices for the current speech endpoint are:
- `nova`
- `shimmer`
- `echo`
- `onyx`
- `fable`
- `alloy`
- `ash`
- `sage`
- `coral`
## Cursor And Interaction States
The mouse cursor, not the text insertion caret, indicates process state. The command input caret keeps its normal text-editing appearance.
Required states:
- ready for new command
- command sent, waiting for game server answer
- waiting silently for required TTS generation before animation can continue
- audio and animation playing while another sentence is being generated
- audio and animation playing while no further generation is needed
State changes should also be logged to the console during development.
Typing behavior:
- While a game is running, printable keyboard input should go to the command input regardless of the clicked element.
- Enter sends the command when input is non-empty.
- Space still inserts a space in the input, but if playback is active it also fast-forwards current playback.
- Clicking on the book while playback is active fast-forwards current playback.
## Game API Specification
The client/server game API supports:
```text
newGame()
loadGame(slot)
saveGame(slot)
hasSaveGame(slot)
getSaveGames()
isGameRunning()
```
`slot` is a positive integer from `1` to the number of save slots exposed by the UI.
Current placeholder behavior:
- `newGame()` starts the demo game and emits the introduction/current room description.
- `saveGame(slot)` records a placeholder save for the current socket session.
- `hasSaveGame(slot)` checks that session-local placeholder save.
- `getSaveGames()` returns the saved slot numbers for the session.
- `loadGame(slot)` requires a placeholder save and then starts the demo game like `newGame()`.
- Saves do not persist across reloads yet.
- `isGameRunning()` returns true after a game starts and until the session ends.
UI requirements before a game starts:
- Command input is hidden.
- Right page shows the startup prompt.
- `new game` is enabled.
- `load` is enabled only when `hasSaveGame(slot)` is true.
- `save` is disabled.
UI requirements after a game starts:
- Command input is visible and focused.
- `save` is enabled.
- `load` reflects save availability.
- `new game` remains enabled and restarts the game.
## Server And World Model
The TypeScript server serves the web client and owns socket communication. The CLI and web modes use `GameRunner` and a YAML world file.
Current development world:
- Default file: `./data/worlds/example_world.yml`
- Server-side command handling is currently mirrored for UI testing: entered commands are sent to the server and returned as narrative text through the socket response path.
Longer-term goal:
- Keep a deterministic world model for rooms, objects, actions, state changes, and validation.
- Use the LLM to translate natural language player intent into game actions.
- Use the LLM to render state changes as prose without allowing hallucinated state.
## Quality Rules
- Preserve existing module boundaries when changing code.
- Prefer event-driven and Promise-based coordination over timing hacks.
- Do not use loader fallbacks to hide bad dependency declarations.
- Keep the book page visually stable under resizing.
- Validate typography visually after layout changes.
- Validate TTS timing with real provider playback and with TTS `none`.
- Treat the ad-blocker console error `onpage-dialog.preload.js:121 Uncaught ReferenceError: browser is not defined` as unrelated noise.
## Progress Checklist
### Completed
- [x] Split monolithic prototype concepts into focused client modules.
- [x] Added loader, registry, base module, states, dependency declarations, and ordered initialization.
- [x] Added development cache busting and no-cache static file serving.
- [x] Added socket-backed game API wrapper.
- [x] Added manual game start flow for browser audio policies.
- [x] Added SmartyPants support.
- [x] Added Hyphenopoly support.
- [x] Added Knuth-Plass paragraph layout.
- [x] Fixed overfull words, incorrect spacing, and hyphenation integration regressions from the prototype migration.
- [x] Added chapter heading and dropcap markup.
- [x] Added section/textblock markup.
- [x] Added Markdown emphasis parsing.
- [x] Added image markup parsing for future image rendering.
- [x] Added sound effect markup and playback.
- [x] Added music markup, playback modes, loop/once, and lead-in.
- [x] Added music ducking during TTS.
- [x] Added TTS `none` mode.
- [x] Added OpenAI TTS support and restricted OpenAI voice list.
- [x] Added ElevenLabs speed clamping.
- [x] Added persisted speed and speech state.
- [x] Added speech toggle synchronization between top bar and options.
- [x] Added volume controls for speech, music, and sound effects.
- [x] Added command input focus and global typing behavior.
- [x] Added command mirroring through the server path for UI testing.
- [x] Added fast-forward for animation and TTS fade/stop.
### In Progress
- [ ] Keep validating provider-specific speed conversion for all TTS providers against real API behavior.
- [ ] Tighten automated checks around top-bar/options state initialization after reload.
- [ ] Improve automated visual regression coverage for page scaling, dropcap line-height alignment, and paragraph indentation.
- [ ] Improve automated audio tests for music ducking, sound effect timing, and fast-forward fadeout.
### Pending
- [ ] Implement image rendering for `::image[widescreen]` and `::image[portrait]`.
- [ ] Replace placeholder save implementation with durable save files or server-side save storage.
- [ ] Replace command mirroring with the full LLM/world-model command loop when typography/audio testing no longer needs mirroring.
- [ ] Add optical margin alignment or punctuation protrusion support for line endings.
- [ ] Add more provider readiness tests and richer diagnostics in options.
- [ ] Add unit tests for `SentenceQueue`, markup parsing, TTS cache key generation, and game API methods.
- [ ] Add browser integration tests for module loading, new-game startup, command input behavior, and playback controls.
## Consolidation Notes
The durable content from the former root and `references/` documents has been merged here or into `README.md`. The old reference files should remain untouched until the user approves cleanup, but they are no longer intended as the source of truth.