Files
ai.interactive.fiction/references/SPECIFICATION.md
T
2025-04-04 00:02:28 +00:00

14 KiB

Code Guidlines

1. Asynchronous Programming Principles:

  • Primary Mechanism: Use async/await and Promises for handling asynchronous operations.
  • Non-Blocking: Ensure the main thread remains responsive. Long-running operations (like Kokoro loading) should be handled in a way that doesn't block UI updates or animations (e.g., using requestIdleCallback if appropriate, or careful yielding).
  • Event-Driven Communication: Use a dedicated event system (like the ModuleEvent class created) for communication between the loader and modules (e.g., for progress updates, state changes, messages) instead of injecting callbacks directly from the loader into module methods.

2. Module System Standards & Dependency Management:

  • Native ES Modules: Utilize the browser's native ES Module system (import/export, <script type="module">) without relying on build tools.
  • Lean Loader: The loader.js file should be focused only on:
    • Orchestrating the loading of module scripts.
    • Monitoring module initialization progress and state via the event system.
    • Displaying the loading status UI.
    • Hiding the overlay and potentially starting the main application loop after all modules are finished.
  • Module Responsibility: All module-specific logic, configuration, resource loading (like CSS, images, or specific libraries like Kokoro), and detailed progress reporting should reside within the respective module file, not in loader.js.
  • Dependency Declaration: Modules must declare their dependencies (e.g., ui-controller depends on tts and animation-queue).
  • Loader Enforces Order: The loader is responsible for ensuring that a module's init phase only begins after all its declared dependencies have reached the FINISHED state.
  • Rely on Dependency Management: Modules should assume their dependencies will be loaded and ready before their init function is called by the loader. There should be no conditional checks within a module like if (dependencyModule) with fallbacks for when the dependency isn't ready.

3. Module Interface & Code Sharing:

  • Base Class: Use a BaseModule class that all modules extend. This enforces a consistent interface (e.g., initializeInterface, getState) and provides shared functionality (e.g., changeState, reportProgress, event dispatching).
  • Module Registry: Use a central moduleRegistry to register modules and facilitate dependency checking and management.
  • Preserve Functionality: When adapting existing modules (like ui-controller) to the new BaseModule interface, all original functionality must be preserved and integrated correctly, not replaced with placeholders.

4. State Management:

  • Defined States: Modules must adhere to the defined states: PENDING, LOADING (script loading), WAITING (waiting for dependencies), INITIALIZING (running init logic), FINISHED, ERROR.
  • Accurate Reporting: Modules must accurately report their state transitions via the event system. A module (like tts) should not report FINISHED until all its critical internal operations (including background loading like Kokoro) are complete. The loader's UI must display these states correctly.

5. Handling setTimeout and Fallbacks:

  • setTimeout for Flow Control/Synchronization: Strictly prohibited. Using setTimeout to wait for asynchronous operations to complete, fix timing issues, or manage dependencies is considered a hack and indicates a flaw in the asynchronous architecture. Proper use of async/await, Promises, and the loader's dependency management should make this unnecessary.
  • setTimeout for Delays: Acceptable only within well-encapsulated components for specific, justifiable reasons (like debouncing, throttling, or potentially very short delays if absolutely unavoidable after direct DOM manipulation, though this should also be minimized). It must not be used to paper over asynchronous race conditions or timing problems. The AnimationQueue is an acceptable place for internal scheduling timeouts, but application code calling it should rely on its event-driven nature.
  • Fallbacks for Missing Dependencies: Strictly prohibited. Code within a module should not check if a dependency exists and provide a fallback path. The module loader's responsibility is to guarantee dependencies are met before initializing the module. Errors should be handled for actual failures during initialization, not for unmet dependencies (which indicates a loader bug).

Adhering to these principles will lead to a cleaner, more robust, and maintainable asynchronous module loading system.

Module Loader System Architecture

The module loader system is designed to manage the loading and initialization of modular components in a structured, dependency-aware manner with visual progress reporting.

Overall Architecture

  1. Module Registry Pattern: Uses a centralized registry to track and manage all modules and their states.

  2. Event-Driven Communication: Modules communicate with the loader and each other through custom events.

  3. Progress Visualization: Provides a visual loading overlay with per-module progress tracking.

  4. State Management: Tracks each module through defined states (PENDING, LOADING, WAITING, INITIALIZING, FINISHED, ERROR).

  5. Dependency Resolution: Handles module dependencies to ensure proper initialization order.

Core Components

  1. ModuleRegistry: Central repository for all modules

    • Tracks registration and availability of modules
    • Manages promises for module readiness
    • Provides dependency resolution through waitForModule and waitForModules
  2. BaseModule: Abstract base class that all modules extend

    • Implements standard lifecycle methods
    • Handles progress reporting and state changes
    • Provides consistent interface for the loader
  3. ModuleLoader: Main orchestrator of the loading process

    • Dynamically loads module scripts
    • Creates and manages the visual loading interface
    • Initializes modules in the correct order
    • Tracks and displays overall loading progress
  4. ModuleEvent: Custom event system for inter-module communication

Loading Sequence

  1. HTML page loads and includes the loader script as a module
  2. DOMContentLoaded triggers the loader initialization
  3. Loader creates the loading UI and registers event listeners
  4. Module scripts are loaded dynamically in parallel
  5. Each module registers itself with the registry
  6. Modules are initialized with dependency checking
  7. Progress is reported and visualized throughout
  8. When all modules reach FINISHED state, loading overlay is hidden

Module Lifecycle

  1. PENDING: Initial state before loading begins
  2. LOADING: Module is loading dependencies
  3. WAITING: Module is waiting for dependencies to be ready
  4. INITIALIZING: Module's initialize() method is executing
  5. FINISHED: Module is fully initialized and ready
  6. ERROR: Module encountered an error during initialization

Integration Pattern

Modules follow a consistent registration pattern:

// Create the singleton instance
const ModuleName = new ModuleNameClass();

// Register with the module registry
moduleRegistry.register(ModuleName);

// Export the module
export { ModuleName };

// Keep a reference in window for loader system
window.ModuleName = ModuleName;

This design creates a flexible, maintainable system for loading complex applications with multiple interdependent components, prioritizing both user experience and performance.

TTS System Structure & Kokoro Loading

After reviewing our chat history, here's a summary of the TTS system structure and how we decided to load the Kokoro TTS engine:

Overall TTS System Architecture

  1. Modular Design: The TTS system uses a modular architecture with multiple handler classes, each implementing a different TTS approach.

  2. Three TTS Providers:

    • BrowserTTSHandler - Uses the built-in Web Speech API
    • KokoroHandler - Uses Kokoro.js neural TTS for high-quality voices
    • ApiTTSHandler - Uses external TTS services like ElevenLabs
  3. Factory Pattern: TTSFactory manages the handlers, provides a unified interface, and handles provider switching.

  4. Module System: TTSPlayer module is registered with the moduleRegistry as part of the modular loading system.

Loading Sequence

  1. The module loader first loads tts-player.js, which in turn loads the tts-factory.js.

  2. The factory initializes providers in order of preference:

    • First loads the BrowserTTSHandler for immediate low-quality TTS
    • Then loads the ApiTTSHandler if configured
    • Finally attempts to load KokoroHandler in the background with low priority
  3. The system uses the best available provider, with a preference for Kokoro when available.

Kokoro TTS Loading Strategy

After consulting the documentation (https://www.npmjs.com/package/kokoro-js), we made these decisions:

  1. Low-Priority Loading: Kokoro is loaded with requestIdleCallback to avoid impacting page performance.

  2. Kokoro npm package integration: Load Kokoro directly from the local server: '/js/kokoro-js.js' contains the minified complete code of the kokoro npm package copied from the node_modules folder to the public directory. Do not try to read or change it, it is too big!

  3. Pipeline Creation: Per documentation, we use the pipeline pattern:

    this.kokoro = await window.kokoroTTS.pipeline('text-to-speech', {
      quantized: true,
      progress_callback: this.progressCallback
    });
    
  4. Voice List: We hardcoded the available voices rather than querying them dynamically.

  5. Audio Playback: Synthesis returns an audio element which we play:

    const audio = await this.kokoro(processedText, {
      voice: this.voiceOptions.voice,
      speed: this.voiceOptions.speed
    });
    audio.play();
    

User Experience Flow

  1. User sees page immediately with browser TTS enabled (fast startup)
  2. Kokoro loads in background without blocking the interface
  3. Once Kokoro is ready, TTS switches to higher quality neural TTS
  4. User can manually switch between providers via the UI if desired

This design prioritizes performance and user experience, making the TTS system both flexible and resource-efficient.

Important practices

  • Always import the following error, when debugging console output: onpage-dialog.preload.js:121 Uncaught ReferenceError: browser is not defined. This is producced by the installed adblocker and has nothing to do with our project.

Text-to-Speech Synchronization Architecture

The TTS system needs to be synchronized with text animations to create a cohesive user experience. This section outlines the requirements and implementation approach.

Transition to Game

The overlay fades away as the first scheduled animation.

  • This fade animation is handled by the animation scheduler module
  • Only after successful fade-out does the game loop start
  • Socket connection is established and begins receiving text

Text Buffering & Sentence Processing

  1. Text Buffer Collection: Incoming text from sockets is collected in a buffer.

    • System can receive fragments of any size (single letters, words, sentences, paragraphs)
    • All text is accumulated in the buffer regardless of fragment size
    • Buffer handles partial/incomplete text gracefully
  2. Sentence Detection: The buffer identifies complete sentences.

    • When full sentences are detected, they are extracted from the buffer
    • If multiple sentences arrive simultaneously, they are split and processed individually
    • Remaining partial sentences stay in the buffer until completion

Synchronized Playback

  1. TTS Generation Queue: Complete sentences enter the TTS generation queue.

    • Generation begins immediately if no other sentence is being processed
    • Results are cached for immediate playback when needed
  2. Animation Timing: Animation speed is synchronized with audio duration.

    • The system calculates animation duration to match TTS audio length exactly
    • Both animation and audio start simultaneously
    • Animation completes at the same time as audio playback
  3. Playback Pipeline: Continuous processing of sentences.

    • As soon as one sentence completes playback, the next begins
    • Next sentence generation starts during current sentence playback
    • This creates a seamless reading experience

Fast-Forward & Control Flow

  1. Fast-Forward Behavior: User can skip current sentence.

    • Pressing the designated fast-forward key completes current animation immediately
    • TTS audio is faded out and stopped
    • System advances to next sentence
  2. Resource Management: TTS generation is resource-conscious.

    • Uses only CPU/GPU resources not needed for animation
    • Generation process can be cancelled by fast-forward
    • System prioritizes smooth animation over TTS preparation
  3. Loading States: Animation waits for TTS when necessary.

    • If next sentence TTS generation isn't ready when needed, animation pauses
    • Fast-forward key can skip incomplete generation
    • User is never blocked completely by TTS generation

Persistent Configuration

  1. Options Storage: The persistence-manager stores TTS settings.

    • Speech on/off state is remembered
    • Speed settings are preserved between sessions
    • Voice preferences are stored
  2. Options UI: Add an options button and modal dialog.

    • Show additional options in a modal window
    • Include volume sliders:
      • Master volume control
      • TTS volume control
      • Music volume control
      • Sound effects volume control
    • Include manual TTS system selection
    • All settings are persisted via the persistence-manager

This synchronized approach ensures that text animations and speech work together seamlessly, creating a more immersive storytelling experience while maintaining smooth performance.