598 lines
28 KiB
Markdown
598 lines
28 KiB
Markdown
# Code Guidlines
|
|
|
|
**1. Asynchronous Programming Principles:**
|
|
|
|
* **Primary Mechanism:** Use `async`/`await` and Promises for handling asynchronous operations.
|
|
* **Non-Blocking:** Ensure the main thread remains responsive. Long-running operations (like Kokoro loading) should be handled in a way that doesn't block UI updates or animations (e.g., using `requestIdleCallback` if appropriate, or careful yielding).
|
|
* **Event-Driven Communication:** Use a dedicated event system (like the `ModuleEvent` class created) for communication between the loader and modules (e.g., for progress updates, state changes, messages) instead of injecting callbacks directly from the loader into module methods.
|
|
|
|
**2. Module System Standards & Dependency Management:**
|
|
|
|
* **Native ES Modules:** Utilize the browser's native ES Module system (`import`/`export`, `<script type="module">`) without relying on build tools.
|
|
* **Lean Loader:** The `loader.js` file should be focused *only* on:
|
|
* Orchestrating the loading of module scripts.
|
|
* Monitoring module initialization progress and state via the event system.
|
|
* Displaying the loading status UI.
|
|
* Hiding the overlay and potentially starting the main application loop *after* all modules are finished.
|
|
* **Module Responsibility:** All module-specific logic, configuration, resource loading (like CSS, images, or specific libraries like Kokoro), and detailed progress reporting should reside *within* the respective module file, not in `loader.js`.
|
|
* **Dependency Declaration:** Modules must declare their dependencies (e.g., `ui-controller` depends on `tts` and `animation-queue`).
|
|
* **Loader Enforces Order:** The loader is responsible for ensuring that a module's `init` phase only begins *after* all its declared dependencies have reached the `FINISHED` state.
|
|
* **Rely on Dependency Management:** Modules should *assume* their dependencies will be loaded and ready before their `init` function is called by the loader. There should be **no** conditional checks within a module like `if (dependencyModule)` with fallbacks for when the dependency isn't ready.
|
|
|
|
**3. Module Interface & Code Sharing:**
|
|
|
|
* **Base Class:** Use a `BaseModule` class that all modules extend. This enforces a consistent interface (e.g., `initializeInterface`, `getState`) and provides shared functionality (e.g., `changeState`, `reportProgress`, event dispatching).
|
|
* **Module Registry:** Use a central `moduleRegistry` to register modules and facilitate dependency checking and management.
|
|
* **Preserve Functionality:** When adapting existing modules (like `ui-controller`) to the new `BaseModule` interface, all original functionality must be preserved and integrated correctly, not replaced with placeholders.
|
|
|
|
**4. State Management:**
|
|
|
|
* **Defined States:** Modules must adhere to the defined states: `PENDING`, `LOADING` (script loading), `WAITING` (waiting for dependencies), `INITIALIZING` (running `init` logic), `FINISHED`, `ERROR`.
|
|
* **Accurate Reporting:** Modules must accurately report their state transitions via the event system. A module (like `tts`) should not report `FINISHED` until all its critical internal operations (including background loading like Kokoro) are complete. The loader's UI must display these states correctly.
|
|
|
|
**5. Handling `setTimeout` and Fallbacks:**
|
|
|
|
* **`setTimeout` for Flow Control/Synchronization:** **Strictly prohibited.** Using `setTimeout` to wait for asynchronous operations to complete, fix timing issues, or manage dependencies is considered a hack and indicates a flaw in the asynchronous architecture. Proper use of `async`/`await`, Promises, and the loader's dependency management should make this unnecessary.
|
|
* **`setTimeout` for Delays:** Acceptable *only* within well-encapsulated components for specific, justifiable reasons (like debouncing, throttling, or potentially *very* short delays *if absolutely unavoidable* after direct DOM manipulation, though this should also be minimized). It must **not** be used to paper over asynchronous race conditions or timing problems. The `AnimationQueue` is an acceptable place for internal scheduling timeouts, but application code calling it should rely on its event-driven nature.
|
|
* **Fallbacks for Missing Dependencies:** **Strictly prohibited.** Code within a module should not check if a dependency exists and provide a fallback path. The module loader's responsibility is to guarantee dependencies are met before initializing the module. Errors should be handled for *actual* failures during initialization, not for unmet dependencies (which indicates a loader bug).
|
|
|
|
Adhering to these principles will lead to a cleaner, more robust, and maintainable asynchronous module loading system.
|
|
|
|
# Module Loader System Architecture
|
|
|
|
The module loader system is designed to manage the loading and initialization of modular components in a structured, dependency-aware manner with visual progress reporting.
|
|
|
|
## Overall Architecture
|
|
|
|
1. **Module Registry Pattern**: Uses a centralized registry to track and manage all modules and their states.
|
|
|
|
2. **Event-Driven Communication**: Modules communicate with the loader and each other through custom events.
|
|
|
|
3. **Progress Visualization**: Provides a visual loading overlay with per-module progress tracking.
|
|
|
|
4. **State Management**: Tracks each module through defined states (PENDING, LOADING, WAITING, INITIALIZING, FINISHED, ERROR).
|
|
|
|
5. **Dependency Resolution**: Handles module dependencies to ensure proper initialization order.
|
|
|
|
## Core Components
|
|
|
|
1. **ModuleRegistry**: Central repository for all modules
|
|
- Tracks registration and availability of modules
|
|
- Manages promises for module readiness
|
|
- Provides dependency resolution through `waitForModule` and `waitForModules`
|
|
|
|
2. **BaseModule**: Abstract base class that all modules extend
|
|
- Implements standard lifecycle methods
|
|
- Handles progress reporting and state changes
|
|
- Provides consistent interface for the loader
|
|
|
|
3. **ModuleLoader**: Main orchestrator of the loading process
|
|
- Dynamically loads module scripts
|
|
- Creates and manages the visual loading interface
|
|
- Initializes modules in the correct order
|
|
- Tracks and displays overall loading progress
|
|
|
|
4. **ModuleEvent**: Custom event system for inter-module communication
|
|
|
|
## Loading Sequence
|
|
|
|
1. HTML page loads and includes the loader script as a module
|
|
2. DOMContentLoaded triggers the loader initialization
|
|
3. Loader creates the loading UI and registers event listeners
|
|
4. Module scripts are loaded dynamically in parallel
|
|
5. Each module registers itself with the registry
|
|
6. Modules are initialized with dependency checking
|
|
7. Progress is reported and visualized throughout
|
|
8. When all modules reach FINISHED state, loading overlay is hidden
|
|
|
|
## Module Lifecycle
|
|
|
|
1. **PENDING**: Initial state before loading begins
|
|
2. **LOADING**: Module is loading dependencies
|
|
3. **WAITING**: Module is waiting for dependencies to be ready
|
|
4. **INITIALIZING**: Module's initialize() method is executing
|
|
5. **FINISHED**: Module is fully initialized and ready
|
|
6. **ERROR**: Module encountered an error during initialization
|
|
|
|
## Integration Pattern
|
|
|
|
Modules follow a consistent registration pattern:
|
|
```javascript
|
|
// Create the singleton instance
|
|
const ModuleName = new ModuleNameClass();
|
|
|
|
// Register with the module registry
|
|
moduleRegistry.register(ModuleName);
|
|
|
|
// Export the module
|
|
export { ModuleName };
|
|
|
|
// Keep a reference in window for loader system
|
|
window.ModuleName = ModuleName;
|
|
```
|
|
|
|
This design creates a flexible, maintainable system for loading complex applications with multiple interdependent components, prioritizing both user experience and performance.
|
|
|
|
|
|
# TTS System Structure & Kokoro Loading
|
|
|
|
After reviewing our chat history, here's a summary of the TTS system structure and how we decided to load the Kokoro TTS engine:
|
|
|
|
## Overall TTS System Architecture
|
|
|
|
1. **Modular Design**: The TTS system uses a modular architecture with multiple handler classes, each implementing a different TTS approach.
|
|
|
|
2. **Three TTS Providers**:
|
|
- `BrowserTTSHandler` - Uses the built-in Web Speech API
|
|
- `KokoroHandler` - Uses Kokoro.js neural TTS for high-quality voices
|
|
- `ApiTTSHandler` - Uses external TTS services like ElevenLabs
|
|
|
|
3. **Factory Pattern**: `TTSFactory` manages the handlers, provides a unified interface, and handles provider switching.
|
|
|
|
4. **Module System**: `TTSPlayer` module is registered with the `moduleRegistry` as part of the modular loading system.
|
|
|
|
## Loading Sequence
|
|
|
|
1. The module loader first loads `tts-player.js`, which in turn loads the `tts-factory.js`.
|
|
|
|
2. The factory initializes providers in order of preference:
|
|
- First loads the `BrowserTTSHandler` for immediate low-quality TTS
|
|
- Then loads the `ApiTTSHandler` if configured
|
|
- Finally attempts to load `KokoroHandler` in the background with low priority
|
|
|
|
3. The system uses the best available provider, with a preference for Kokoro when available.
|
|
|
|
## Kokoro TTS Loading Strategy
|
|
|
|
After consulting the documentation (https://www.npmjs.com/package/kokoro-js), we made these decisions:
|
|
|
|
1. **Low-Priority Loading**: Kokoro is loaded with `requestIdleCallback` to avoid impacting page performance.
|
|
|
|
2. **Kokoro npm package integration**: Load Kokoro directly from the local server:
|
|
'/js/kokoro-js.js' contains the minified complete code of the kokoro npm package copied from the node_modules folder to the public directory. Do not try to read or change it, it is too big!
|
|
|
|
3. **Pipeline Creation**: Per documentation, we use the pipeline pattern:
|
|
```javascript
|
|
this.kokoro = await window.kokoroTTS.pipeline('text-to-speech', {
|
|
quantized: true,
|
|
progress_callback: this.progressCallback
|
|
});
|
|
```
|
|
|
|
4. **Voice List**: We hardcoded the available voices rather than querying them dynamically.
|
|
|
|
5. **Audio Playback**: Synthesis returns an audio element which we play:
|
|
```javascript
|
|
const audio = await this.kokoro(processedText, {
|
|
voice: this.voiceOptions.voice,
|
|
speed: this.voiceOptions.speed
|
|
});
|
|
audio.play();
|
|
```
|
|
|
|
## User Experience Flow
|
|
|
|
1. User sees page immediately with browser TTS enabled (fast startup)
|
|
2. Kokoro loads in background without blocking the interface
|
|
3. Once Kokoro is ready, TTS switches to higher quality neural TTS
|
|
4. User can manually switch between providers via the UI if desired
|
|
|
|
This design prioritizes performance and user experience, making the TTS system both flexible and resource-efficient.
|
|
|
|
# Important practices
|
|
|
|
- Always import the following error, when debugging console output: onpage-dialog.preload.js:121 Uncaught ReferenceError: browser is not defined. This is producced by the installed adblocker and has nothing to do with our project.
|
|
|
|
# Text-to-Speech Synchronization Architecture
|
|
|
|
The TTS system needs to be synchronized with text animations to create a cohesive user experience. This section outlines the requirements and implementation approach.
|
|
|
|
## Transition to Game
|
|
|
|
The overlay fades away as the first scheduled animation.
|
|
- This fade animation is handled by the animation scheduler module
|
|
- Only after successful fade-out does the game loop start
|
|
- Socket connection is established and begins receiving text
|
|
|
|
## Text Buffering & Sentence Processing
|
|
|
|
1. **Text Buffer Collection**: Incoming text from sockets is collected in a buffer.
|
|
- System can receive fragments of any size (single letters, words, sentences, paragraphs)
|
|
- All text is accumulated in the buffer regardless of fragment size
|
|
- Buffer handles partial/incomplete text gracefully
|
|
|
|
2. **Sentence Detection**: The buffer identifies complete sentences.
|
|
- When full sentences are detected, they are extracted from the buffer
|
|
- If multiple sentences arrive simultaneously, they are split and processed individually
|
|
- Remaining partial sentences stay in the buffer until completion
|
|
|
|
## Synchronized Playback
|
|
|
|
1. **TTS Generation Queue**: Complete sentences enter the TTS generation queue.
|
|
- Generation begins immediately if no other sentence is being processed
|
|
- Results are cached for immediate playback when needed
|
|
|
|
2. **Animation Timing**: Animation speed is synchronized with audio duration.
|
|
- The system calculates animation duration to match TTS audio length exactly
|
|
- Both animation and audio start simultaneously
|
|
- Animation completes at the same time as audio playback
|
|
|
|
3. **Playback Pipeline**: Continuous processing of sentences.
|
|
- As soon as one sentence completes playback, the next begins
|
|
- Next sentence generation starts during current sentence playback
|
|
- This creates a seamless reading experience
|
|
|
|
## Fast-Forward & Control Flow
|
|
|
|
1. **Fast-Forward Behavior**: User can skip current sentence.
|
|
- Pressing the designated fast-forward key completes current animation immediately
|
|
- TTS audio is faded out and stopped
|
|
- System advances to next sentence
|
|
|
|
2. **Resource Management**: TTS generation is resource-conscious.
|
|
- Uses only CPU/GPU resources not needed for animation
|
|
- Generation process can be cancelled by fast-forward
|
|
- System prioritizes smooth animation over TTS preparation
|
|
|
|
3. **Loading States**: Animation waits for TTS when necessary.
|
|
- If next sentence TTS generation isn't ready when needed, animation pauses
|
|
- Fast-forward key can skip incomplete generation
|
|
- User is never blocked completely by TTS generation
|
|
|
|
## Persistent Configuration
|
|
|
|
1. **Options Storage**: The persistence-manager stores TTS settings.
|
|
- Speech on/off state is remembered
|
|
- Speed settings are preserved between sessions
|
|
- Voice preferences are stored
|
|
|
|
2. **Options UI**: Add an options button and modal dialog.
|
|
- Show additional options in a modal window
|
|
- Include volume sliders:
|
|
- Master volume control
|
|
- TTS volume control
|
|
- Music volume control
|
|
- Sound effects volume control
|
|
- Include manual TTS system selection
|
|
- All settings are persisted via the persistence-manager
|
|
|
|
This synchronized approach ensures that text animations and speech work together seamlessly, creating a more immersive storytelling experience while maintaining smooth performance.
|
|
|
|
# Text Output Pipeline Architecture
|
|
|
|
The text output pipeline manages the flow of text from server reception to visual display and audio playback, with a focus on performance and synchronization.
|
|
|
|
## Core Components
|
|
|
|
1. **Socket Client**: Receives raw text fragments from the server.
|
|
|
|
2. **TextBuffer**: Accumulates fragments and identifies complete sentences.
|
|
- Collects all incoming text regardless of fragment size
|
|
- Identifies and extracts complete sentences
|
|
- Maintains partial sentences until completion
|
|
|
|
3. **SentenceQueue**: Manages the preparation pipeline for sentences.
|
|
- Receives complete sentences from TextBuffer
|
|
- Orchestrates parallel processing of TTS generation and text layout
|
|
- Ensures sentences are fully prepared before playback
|
|
- Maintains a queue of sentences ready for playback
|
|
|
|
4. **TTS Generation System**: Prepares audio for sentences.
|
|
- Generates audio in the background without blocking UI
|
|
- Provides audio duration information for synchronization
|
|
- Can be cancelled for fast-forward operations
|
|
- Falls back to character count duration calculation when disabled
|
|
|
|
5. **Typography Processor**: Enhances text presentation quality.
|
|
- Applies smart typography (quotes, em-dashes, etc.)
|
|
- Handles hyphenation for line breaks
|
|
- Preserves special formatting
|
|
|
|
6. **ParagraphLayout**: Calculates optimal text presentation.
|
|
- Computes line breaks using Knuth-Plass algorithm
|
|
- Determines word positioning and timing
|
|
- Adjusts animation duration to match audio length
|
|
|
|
7. **AnimationPlayerQueue**: Manages the playback pipeline.
|
|
- Maintains a playlist of ready-to-play sentences
|
|
- Inserts DOM elements for prepared sentences
|
|
- Coordinates CSS-based animations
|
|
- Monitors animation completion
|
|
- Automatically advances to next sentence
|
|
|
|
## Process Flow
|
|
|
|
1. **Preparation Pipeline**:
|
|
- Socket client receives text and feeds it to TextBuffer
|
|
- TextBuffer identifies complete sentences
|
|
- SentenceQueue receives complete sentences
|
|
- TTS generation and layout processing happen in parallel
|
|
- When both TTS and layout are complete, sentence is marked "ready"
|
|
- Ready sentences are added to AnimationPlayerQueue
|
|
|
|
2. **Playback Pipeline**:
|
|
- AnimationPlayerQueue plays the first ready sentence
|
|
- DOM elements are inserted and CSS animations begin
|
|
- TTS audio plays simultaneously with animations
|
|
- AnimationPlayerQueue monitors complete animation duration
|
|
- When playback completes, the next ready sentence immediately begins
|
|
|
|
3. **Fast-Forward Handling**:
|
|
- Can interrupt at any stage of the pipeline
|
|
- Currently playing animations are immediately completed
|
|
- Currently playing audio is faded out and stopped
|
|
- Any in-progress sentence preparation is cancelled
|
|
- System advances to the next sentence in queue
|
|
|
|
## Speed Synchronization
|
|
|
|
1. **Audio-Driven Timing**: Animation speed is determined by audio duration
|
|
- TTS audio length dictates animation duration
|
|
- Without TTS, duration is calculated from character count and speed setting
|
|
|
|
2. **Seamless Transitions**: Next sentence begins immediately after current completes
|
|
- No gap between sentence playbacks
|
|
- Preparation happens during playback of previous sentence
|
|
|
|
3. **Feedback Loop**: Animation system provides timing data back to preparation pipeline
|
|
- Helps optimize future sentence preparation
|
|
- Allows runtime adjustment of timing parameters
|
|
|
|
This architecture separates preparation from playback, creating a buffer of ready content that enables smooth presentation while handling the computational overhead of text processing and TTS generation in the background.
|
|
|
|
# Text Processing & Layout Architecture
|
|
|
|
The text processing and layout system transforms raw text input into visually appealing, typographically correct, and elegantly animated content through several specialized components.
|
|
|
|
## Component Interactions
|
|
|
|
### Core Components
|
|
|
|
1. **text-processor.js**: Enhances typography and applies hyphenation
|
|
- Entry point for text processing pipeline
|
|
- Manages SmartyPants for typographic enhancements
|
|
- Controls Hyphenopoly for language-aware hyphenation
|
|
- Serves as the central coordinator for text transformation
|
|
|
|
2. **smartypants.js**: Provides typographic punctuation conversion
|
|
- Transforms straight quotes to curly quotes
|
|
- Converts hyphens to em-dashes and en-dashes
|
|
- Handles ellipses and other typographic niceties
|
|
- Operates as a pure function with no dependencies
|
|
|
|
3. **paragraph-layout.js**: Manages paragraph structure and word metrics
|
|
- Breaks text into words and calculates their dimensions
|
|
- Manages paragraph-level styling and layout properties
|
|
- Prepares text for the line-breaking algorithm
|
|
- Connects text-processor output to the layout engine
|
|
|
|
4. **knuth-plass.js**: Implementation of the optimal line-breaking algorithm
|
|
- Calculates aesthetically pleasing line breaks
|
|
- Minimizes "raggedness" across paragraph lines
|
|
- Implements the core Knuth-Plass algorithm
|
|
- Uses linked-list.js for internal data structures
|
|
|
|
5. **linked-list.js**: Provides data structures for the line-breaking algorithm
|
|
- Implements doubly-linked list for efficient node insertion/removal
|
|
- Supports the complex data relationships in the Knuth-Plass algorithm
|
|
- Pure utility with no direct interaction with other components
|
|
|
|
6. **hyphenopoly.module.js**: Performs language-aware hyphenation
|
|
- Contains language-specific hyphenation patterns
|
|
- Provides functions to insert soft hyphens at valid breaking points
|
|
- Loaded dynamically when needed by text-processor.js
|
|
|
|
7. **layout-renderer.js**: Translates calculated layout into DOM elements
|
|
- Takes the output from paragraph-layout.js
|
|
- Generates DOM structure for the text display
|
|
- Creates CSS classes and styles for animations
|
|
- Prepares text for display and animation
|
|
|
|
## Process Flow
|
|
|
|
1. **Text Input → Typography Enhancement**
|
|
```
|
|
Raw Text → text-processor.js → smartypants.js → Enhanced Text
|
|
```
|
|
- Raw text enters the text-processor
|
|
- SmartyPants functions transform quotation marks, dashes, etc.
|
|
- Typography-enhanced text is produced
|
|
|
|
2. **Typography-Enhanced Text → Hyphenation**
|
|
```
|
|
Enhanced Text → text-processor.js → hyphenopoly.module.js → Hyphenated Text
|
|
```
|
|
- Enhanced text is passed to the hyphenation system
|
|
- Language-specific rules determine valid hyphenation points
|
|
- Soft hyphens are inserted at appropriate positions
|
|
|
|
3. **Hyphenated Text → Layout Calculation**
|
|
```
|
|
Hyphenated Text → paragraph-layout.js → knuth-plass.js → Optimized Layout
|
|
```
|
|
- Paragraph layout breaks text into words and calculates metrics
|
|
- Knuth-Plass algorithm calculates optimal line breaks
|
|
- linked-list.js provides the data structures for this process
|
|
- An optimized layout structure is produced
|
|
|
|
4. **Layout → Rendering**
|
|
```
|
|
Optimized Layout → layout-renderer.js → DOM Elements
|
|
```
|
|
- Layout renderer converts the abstract layout to concrete DOM
|
|
- CSS classes and styles are applied for animation
|
|
- Words are positioned according to the calculated layout
|
|
|
|
5. **Rendering → Animation**
|
|
```
|
|
DOM Elements → AnimationQueue → Visual Display
|
|
```
|
|
- The rendered DOM elements are passed to the animation system
|
|
- Words are animated according to timing and styling parameters
|
|
- Visual presentation occurs synchronized with audio if applicable
|
|
|
|
## Implementation Dependencies
|
|
|
|
```
|
|
text-processor.js
|
|
├── smartypants.js
|
|
└── hyphenopoly.module.js
|
|
└── [language pattern files]
|
|
|
|
paragraph-layout.js
|
|
├── knuth-plass.js
|
|
│ └── linked-list.js
|
|
└── [font metrics]
|
|
|
|
layout-renderer.js
|
|
└── [CSS styling]
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
1. **Text Buffer → Text Processor**
|
|
- Text buffer passes complete sentences to the processing pipeline
|
|
- Text processor enhances typography and applies hyphenation
|
|
|
|
2. **Text Processor → Paragraph Layout**
|
|
- Enhanced text flows to paragraph layout for structure analysis
|
|
- Word metrics and paragraph properties are calculated
|
|
|
|
3. **Paragraph Layout → Layout Renderer**
|
|
- Optimized layout information is passed to the renderer
|
|
- Renderer creates DOM elements with appropriate styling
|
|
|
|
4. **Layout Renderer → Animation Queue**
|
|
- Rendered elements are scheduled for animation
|
|
- Animation timing is synchronized with TTS if enabled
|
|
|
|
This architecture ensures typographically beautiful text with optimal line breaks, proper hyphenation, and smooth animation, creating a professional reading experience.
|
|
|
|
# TTS Integration with Localization
|
|
|
|
## Architecture Overview
|
|
|
|
The Text-to-Speech (TTS) system has been refactored to seamlessly integrate with the localization module, ensuring a cohesive user experience across different languages. This integration follows these key architectural principles:
|
|
|
|
1. **Base TTS Handler Pattern**: All TTS handlers extend a common `TTSHandler` class that inherits from `BaseModule`, ensuring consistent interface and behavior.
|
|
|
|
2. **Dependency Injection**: TTS handlers access the localization and persistence modules through the dependency system rather than direct global references.
|
|
|
|
3. **Locale-Aware Voice Selection**: TTS handlers automatically select appropriate voices based on the current locale.
|
|
|
|
4. **Preference Persistence**: User preferences for TTS settings are stored and retrieved through the persistence manager.
|
|
|
|
5. **Optional Functionality**: TTS is treated as an optional feature that can be unavailable without breaking the application.
|
|
|
|
## Core Components
|
|
|
|
1. **TTSFactory**: Central coordinator for TTS functionality
|
|
- Manages initialization of all TTS handlers
|
|
- Implements fallback mechanisms when preferred TTS systems are unavailable
|
|
- Provides access to the active TTS handler
|
|
- Integrates with localization module for language-aware voice selection
|
|
- Reports TTS availability to the UI
|
|
|
|
2. **TTSHandler**: Abstract base class for all TTS handlers
|
|
- Defines common interface methods (speak, stop, getVoices, etc.)
|
|
- Provides shared utility functions for voice selection and preference handling
|
|
- Extends BaseModule for dependency management and event handling
|
|
|
|
3. **TTS Handlers**: Concrete implementations for different TTS approaches
|
|
- **BrowserTTSHandler**: Uses the Web Speech API
|
|
- **ApiTTSHandler**: Communicates with a remote TTS API
|
|
- **KokoroHandler**: Provides neural TTS via Kokoro.js
|
|
|
|
4. **OptionsUI**: User interface for TTS configuration
|
|
- Allows selection of TTS system (Browser, API, Kokoro)
|
|
- Provides voice selection based on available voices for current locale
|
|
- Includes controls for volume, rate, and pitch
|
|
- Persists user preferences via PersistenceManager
|
|
|
|
## Localization Integration
|
|
|
|
1. **Locale-Based Voice Selection**:
|
|
- Each TTS handler implements `setupVoiceFromPreferences()` to select voices based on:
|
|
- User's explicitly saved voice preference
|
|
- Current locale from the localization module
|
|
- Fallback to language-matching voice if exact locale match not found
|
|
- Default voice (typically English) as final fallback
|
|
|
|
2. **Voice Filtering**:
|
|
- TTS handlers filter available voices to prioritize those matching the current locale
|
|
- Voice lists in the UI are sorted to show locale-matching voices first
|
|
|
|
3. **Preference Persistence**:
|
|
- TTS settings (system, voice, volume, rate) are saved per-user
|
|
- Settings are automatically applied when the application loads
|
|
- Changes in the localization settings trigger voice re-selection
|
|
|
|
## Initialization Flow
|
|
|
|
1. **TTSFactory Initialization**:
|
|
```
|
|
TTSFactory.initialize()
|
|
├── Loads user preferences via PersistenceManager
|
|
├── Initializes all available TTS handlers
|
|
│ ├── KokoroHandler.initialize()
|
|
│ └── BrowserTTSHandler.initialize()
|
|
├── Selects active TTS handler based on preferences and availability
|
|
├── Sets up event listeners for locale changes
|
|
└── Dispatches TTS availability event
|
|
```
|
|
|
|
2. **TTS Handler Initialization**:
|
|
```
|
|
TTSHandler.initialize()
|
|
├── Loads system-specific resources
|
|
├── Retrieves available voices
|
|
├── Gets dependencies (localization, persistenceManager)
|
|
└── Sets up voice based on preferences and locale
|
|
```
|
|
|
|
3. **Voice Setup Process**:
|
|
```
|
|
setupVoiceFromPreferences()
|
|
├── Gets user's preferred voice from persistenceManager
|
|
├── If preferred voice exists and is available:
|
|
│ └── Use preferred voice
|
|
├── Otherwise:
|
|
│ ├── Get current locale from localization module
|
|
│ ├── Find voice matching current locale
|
|
│ ├── If no match, find voice matching language part
|
|
│ └── If still no match, use default voice
|
|
└── Update preference with selected voice
|
|
```
|
|
|
|
## Event Handling
|
|
|
|
1. **Locale Change Events**:
|
|
- When user changes locale in the UI, the localization module emits a 'locale-changed' event
|
|
- TTSFactory listens for this event and triggers voice re-selection in the active TTS handler
|
|
|
|
2. **TTS Preference Events**:
|
|
- Changes to TTS settings in the options UI trigger preference updates
|
|
- These updates are persisted and immediately applied to the active TTS handler
|
|
|
|
3. **TTS Availability Events**:
|
|
- TTSFactory dispatches 'tts:availability' events to notify the UI about TTS availability
|
|
- UI Controller listens for these events and updates the speech toggle button accordingly
|
|
|
|
## Error Handling and Fallbacks
|
|
|
|
1. **TTS System Fallbacks**:
|
|
- If the preferred TTS system fails to initialize, TTSFactory falls back to the next available system
|
|
- Priority order: Kokoro > Browser > None (with None being acceptable)
|
|
- API TTS is not used as a fallback as it requires manual configuration
|
|
|
|
2. **Voice Selection Fallbacks**:
|
|
- If preferred voice is unavailable, fall back to locale-matching voice
|
|
- If no locale match, fall back to language match
|
|
- If no language match, fall back to default (typically English)
|
|
|
|
3. **TTS Unavailability Handling**:
|
|
- If no TTS handlers are available, the system continues to function without TTS
|
|
- The speech toggle button is disabled in the UI
|
|
- The application remains fully functional for text-only interaction
|
|
|
|
This architecture ensures that the TTS system seamlessly adapts to the user's language preferences while maintaining a consistent and intuitive user experience across different locales, even when TTS is unavailable. |