Fix TTS module initialization and dependency issues. Update module IDs for consistency, improve circular dependency detection, and fix UI Controller event handling.

2025-04-04 19:15:28 +00:00
parent 02c7b9ef28
commit 49a5af252c
33 changed files with 7227 additions and 4060 deletions
@@ -256,4 +256,343 @@ The overlay fades away as the first scheduled animation.
   - Include manual TTS system selection
   - All settings are persisted via the persistence-manager

-This synchronized approach ensures that text animations and speech work together seamlessly, creating a more immersive storytelling experience while maintaining smooth performance.
+This synchronized approach ensures that text animations and speech work together seamlessly, creating a more immersive storytelling experience while maintaining smooth performance.
+
+# Text Output Pipeline Architecture
+
+The text output pipeline manages the flow of text from server reception to visual display and audio playback, with a focus on performance and synchronization.
+
+## Core Components
+
+1. **Socket Client**: Receives raw text fragments from the server.
+
+2. **TextBuffer**: Accumulates fragments and identifies complete sentences.
+   - Collects all incoming text regardless of fragment size
+   - Identifies and extracts complete sentences
+   - Maintains partial sentences until completion
+
+3. **SentenceQueue**: Manages the preparation pipeline for sentences.
+   - Receives complete sentences from TextBuffer
+   - Orchestrates parallel processing of TTS generation and text layout
+   - Ensures sentences are fully prepared before playback
+   - Maintains a queue of sentences ready for playback
+
+4. **TTS Generation System**: Prepares audio for sentences.
+   - Generates audio in the background without blocking UI
+   - Provides audio duration information for synchronization
+   - Can be cancelled for fast-forward operations
+   - Falls back to character count duration calculation when disabled
+
+5. **Typography Processor**: Enhances text presentation quality.
+   - Applies smart typography (quotes, em-dashes, etc.)
+   - Handles hyphenation for line breaks
+   - Preserves special formatting
+
+6. **ParagraphLayout**: Calculates optimal text presentation.
+   - Computes line breaks using Knuth-Plass algorithm
+   - Determines word positioning and timing
+   - Adjusts animation duration to match audio length
+
+7. **AnimationPlayerQueue**: Manages the playback pipeline.
+   - Maintains a playlist of ready-to-play sentences
+   - Inserts DOM elements for prepared sentences
+   - Coordinates CSS-based animations
+   - Monitors animation completion
+   - Automatically advances to next sentence
+
+## Process Flow
+
+1. **Preparation Pipeline**:
+   - Socket client receives text and feeds it to TextBuffer
+   - TextBuffer identifies complete sentences
+   - SentenceQueue receives complete sentences
+   - TTS generation and layout processing happen in parallel
+   - When both TTS and layout are complete, sentence is marked "ready"
+   - Ready sentences are added to AnimationPlayerQueue
+
+2. **Playback Pipeline**:
+   - AnimationPlayerQueue plays the first ready sentence
+   - DOM elements are inserted and CSS animations begin
+   - TTS audio plays simultaneously with animations
+   - AnimationPlayerQueue monitors complete animation duration
+   - When playback completes, the next ready sentence immediately begins
+
+3. **Fast-Forward Handling**:
+   - Can interrupt at any stage of the pipeline
+   - Currently playing animations are immediately completed
+   - Currently playing audio is faded out and stopped
+   - Any in-progress sentence preparation is cancelled
+   - System advances to the next sentence in queue
+
+## Speed Synchronization
+
+1. **Audio-Driven Timing**: Animation speed is determined by audio duration
+   - TTS audio length dictates animation duration
+   - Without TTS, duration is calculated from character count and speed setting
+
+2. **Seamless Transitions**: Next sentence begins immediately after current completes
+   - No gap between sentence playbacks
+   - Preparation happens during playback of previous sentence
+
+3. **Feedback Loop**: Animation system provides timing data back to preparation pipeline
+   - Helps optimize future sentence preparation
+   - Allows runtime adjustment of timing parameters
+
+This architecture separates preparation from playback, creating a buffer of ready content that enables smooth presentation while handling the computational overhead of text processing and TTS generation in the background.
+
+# Text Processing & Layout Architecture
+
+The text processing and layout system transforms raw text input into visually appealing, typographically correct, and elegantly animated content through several specialized components.
+
+## Component Interactions
+
+### Core Components
+
+1. **text-processor.js**: Enhances typography and applies hyphenation
+   - Entry point for text processing pipeline
+   - Manages SmartyPants for typographic enhancements
+   - Controls Hyphenopoly for language-aware hyphenation
+   - Serves as the central coordinator for text transformation
+
+2. **smartypants.js**: Provides typographic punctuation conversion
+   - Transforms straight quotes to curly quotes
+   - Converts hyphens to em-dashes and en-dashes
+   - Handles ellipses and other typographic niceties
+   - Operates as a pure function with no dependencies
+
+3. **paragraph-layout.js**: Manages paragraph structure and word metrics
+   - Breaks text into words and calculates their dimensions
+   - Manages paragraph-level styling and layout properties
+   - Prepares text for the line-breaking algorithm
+   - Connects text-processor output to the layout engine
+
+4. **knuth-plass.js**: Implementation of the optimal line-breaking algorithm
+   - Calculates aesthetically pleasing line breaks
+   - Minimizes "raggedness" across paragraph lines
+   - Implements the core Knuth-Plass algorithm
+   - Uses linked-list.js for internal data structures
+
+5. **linked-list.js**: Provides data structures for the line-breaking algorithm
+   - Implements doubly-linked list for efficient node insertion/removal
+   - Supports the complex data relationships in the Knuth-Plass algorithm
+   - Pure utility with no direct interaction with other components
+
+6. **hyphenopoly.module.js**: Performs language-aware hyphenation
+   - Contains language-specific hyphenation patterns
+   - Provides functions to insert soft hyphens at valid breaking points
+   - Loaded dynamically when needed by text-processor.js
+
+7. **layout-renderer.js**: Translates calculated layout into DOM elements
+   - Takes the output from paragraph-layout.js
+   - Generates DOM structure for the text display
+   - Creates CSS classes and styles for animations
+   - Prepares text for display and animation
+
+## Process Flow
+
+1. **Text Input → Typography Enhancement**
+   ```
+   Raw Text → text-processor.js → smartypants.js → Enhanced Text
+   ```
+   - Raw text enters the text-processor
+   - SmartyPants functions transform quotation marks, dashes, etc.
+   - Typography-enhanced text is produced
+
+2. **Typography-Enhanced Text → Hyphenation**
+   ```
+   Enhanced Text → text-processor.js → hyphenopoly.module.js → Hyphenated Text
+   ```
+   - Enhanced text is passed to the hyphenation system
+   - Language-specific rules determine valid hyphenation points
+   - Soft hyphens are inserted at appropriate positions
+
+3. **Hyphenated Text → Layout Calculation**
+   ```
+   Hyphenated Text → paragraph-layout.js → knuth-plass.js → Optimized Layout
+   ```
+   - Paragraph layout breaks text into words and calculates metrics
+   - Knuth-Plass algorithm calculates optimal line breaks
+   - linked-list.js provides the data structures for this process
+   - An optimized layout structure is produced
+
+4. **Layout → Rendering**
+   ```
+   Optimized Layout → layout-renderer.js → DOM Elements
+   ```
+   - Layout renderer converts the abstract layout to concrete DOM
+   - CSS classes and styles are applied for animation
+   - Words are positioned according to the calculated layout
+
+5. **Rendering → Animation**
+   ```
+   DOM Elements → AnimationQueue → Visual Display
+   ```
+   - The rendered DOM elements are passed to the animation system
+   - Words are animated according to timing and styling parameters
+   - Visual presentation occurs synchronized with audio if applicable
+
+## Implementation Dependencies
+
+```
+text-processor.js
+├── smartypants.js
+└── hyphenopoly.module.js
+    └── [language pattern files]
+
+paragraph-layout.js
+├── knuth-plass.js
+│   └── linked-list.js
+└── [font metrics]
+
+layout-renderer.js
+└── [CSS styling]
+```
+
+## Integration Points
+
+1. **Text Buffer → Text Processor**
+   - Text buffer passes complete sentences to the processing pipeline
+   - Text processor enhances typography and applies hyphenation
+
+2. **Text Processor → Paragraph Layout**
+   - Enhanced text flows to paragraph layout for structure analysis
+   - Word metrics and paragraph properties are calculated
+
+3. **Paragraph Layout → Layout Renderer**
+   - Optimized layout information is passed to the renderer
+   - Renderer creates DOM elements with appropriate styling
+
+4. **Layout Renderer → Animation Queue**
+   - Rendered elements are scheduled for animation
+   - Animation timing is synchronized with TTS if enabled
+
+This architecture ensures typographically beautiful text with optimal line breaks, proper hyphenation, and smooth animation, creating a professional reading experience.
+
+# TTS Integration with Localization
+
+## Architecture Overview
+
+The Text-to-Speech (TTS) system has been refactored to seamlessly integrate with the localization module, ensuring a cohesive user experience across different languages. This integration follows these key architectural principles:
+
+1. **Base TTS Handler Pattern**: All TTS handlers extend a common `TTSHandler` class that inherits from `BaseModule`, ensuring consistent interface and behavior.
+
+2. **Dependency Injection**: TTS handlers access the localization and persistence modules through the dependency system rather than direct global references.
+
+3. **Locale-Aware Voice Selection**: TTS handlers automatically select appropriate voices based on the current locale.
+
+4. **Preference Persistence**: User preferences for TTS settings are stored and retrieved through the persistence manager.
+
+5. **Optional Functionality**: TTS is treated as an optional feature that can be unavailable without breaking the application.
+
+## Core Components
+
+1. **TTSFactory**: Central coordinator for TTS functionality
+   - Manages initialization of all TTS handlers
+   - Implements fallback mechanisms when preferred TTS systems are unavailable
+   - Provides access to the active TTS handler
+   - Integrates with localization module for language-aware voice selection
+   - Reports TTS availability to the UI
+
+2. **TTSHandler**: Abstract base class for all TTS handlers
+   - Defines common interface methods (speak, stop, getVoices, etc.)
+   - Provides shared utility functions for voice selection and preference handling
+   - Extends BaseModule for dependency management and event handling
+
+3. **TTS Handlers**: Concrete implementations for different TTS approaches
+   - **BrowserTTSHandler**: Uses the Web Speech API
+   - **ApiTTSHandler**: Communicates with a remote TTS API
+   - **KokoroHandler**: Provides neural TTS via Kokoro.js
+
+4. **OptionsUI**: User interface for TTS configuration
+   - Allows selection of TTS system (Browser, API, Kokoro)
+   - Provides voice selection based on available voices for current locale
+   - Includes controls for volume, rate, and pitch
+   - Persists user preferences via PersistenceManager
+
+## Localization Integration
+
+1. **Locale-Based Voice Selection**:
+   - Each TTS handler implements `setupVoiceFromPreferences()` to select voices based on:
+     - User's explicitly saved voice preference
+     - Current locale from the localization module
+     - Fallback to language-matching voice if exact locale match not found
+     - Default voice (typically English) as final fallback
+
+2. **Voice Filtering**:
+   - TTS handlers filter available voices to prioritize those matching the current locale
+   - Voice lists in the UI are sorted to show locale-matching voices first
+
+3. **Preference Persistence**:
+   - TTS settings (system, voice, volume, rate) are saved per-user
+   - Settings are automatically applied when the application loads
+   - Changes in the localization settings trigger voice re-selection
+
+## Initialization Flow
+
+1. **TTSFactory Initialization**:
+   ```
+   TTSFactory.initialize()
+   ├── Loads user preferences via PersistenceManager
+   ├── Initializes all available TTS handlers
+   │   ├── KokoroHandler.initialize()
+   │   └── BrowserTTSHandler.initialize()
+   ├── Selects active TTS handler based on preferences and availability
+   ├── Sets up event listeners for locale changes
+   └── Dispatches TTS availability event
+   ```
+
+2. **TTS Handler Initialization**:
+   ```
+   TTSHandler.initialize()
+   ├── Loads system-specific resources
+   ├── Retrieves available voices
+   ├── Gets dependencies (localization, persistenceManager)
+   └── Sets up voice based on preferences and locale
+   ```
+
+3. **Voice Setup Process**:
+   ```
+   setupVoiceFromPreferences()
+   ├── Gets user's preferred voice from persistenceManager
+   ├── If preferred voice exists and is available:
+   │   └── Use preferred voice
+   ├── Otherwise:
+   │   ├── Get current locale from localization module
+   │   ├── Find voice matching current locale
+   │   ├── If no match, find voice matching language part
+   │   └── If still no match, use default voice
+   └── Update preference with selected voice
+   ```
+
+## Event Handling
+
+1. **Locale Change Events**:
+   - When user changes locale in the UI, the localization module emits a 'locale-changed' event
+   - TTSFactory listens for this event and triggers voice re-selection in the active TTS handler
+
+2. **TTS Preference Events**:
+   - Changes to TTS settings in the options UI trigger preference updates
+   - These updates are persisted and immediately applied to the active TTS handler
+
+3. **TTS Availability Events**:
+   - TTSFactory dispatches 'tts:availability' events to notify the UI about TTS availability
+   - UI Controller listens for these events and updates the speech toggle button accordingly
+
+## Error Handling and Fallbacks
+
+1. **TTS System Fallbacks**:
+   - If the preferred TTS system fails to initialize, TTSFactory falls back to the next available system
+   - Priority order: Kokoro > Browser > None (with None being acceptable)
+   - API TTS is not used as a fallback as it requires manual configuration
+
+2. **Voice Selection Fallbacks**:
+   - If preferred voice is unavailable, fall back to locale-matching voice
+   - If no locale match, fall back to language match
+   - If no language match, fall back to default (typically English)
+
+3. **TTS Unavailability Handling**:
+   - If no TTS handlers are available, the system continues to function without TTS
+   - The speech toggle button is disabled in the UI
+   - The application remains fully functional for text-only interaction
+
+This architecture ensures that the TTS system seamlessly adapts to the user's language preferences while maintaining a consistent and intuitive user experience across different locales, even when TTS is unavailable.