Fix TTS module initialization and dependency issues. Update module IDs for consistency, improve circular dependency detection, and fix UI Controller event handling.

This commit is contained in:
2025-04-04 19:15:28 +00:00
parent 02c7b9ef28
commit 49a5af252c
33 changed files with 7227 additions and 4060 deletions
+340 -1
View File
@@ -256,4 +256,343 @@ The overlay fades away as the first scheduled animation.
- Include manual TTS system selection
- All settings are persisted via the persistence-manager
This synchronized approach ensures that text animations and speech work together seamlessly, creating a more immersive storytelling experience while maintaining smooth performance.
This synchronized approach ensures that text animations and speech work together seamlessly, creating a more immersive storytelling experience while maintaining smooth performance.
# Text Output Pipeline Architecture
The text output pipeline manages the flow of text from server reception to visual display and audio playback, with a focus on performance and synchronization.
## Core Components
1. **Socket Client**: Receives raw text fragments from the server.
2. **TextBuffer**: Accumulates fragments and identifies complete sentences.
- Collects all incoming text regardless of fragment size
- Identifies and extracts complete sentences
- Maintains partial sentences until completion
3. **SentenceQueue**: Manages the preparation pipeline for sentences.
- Receives complete sentences from TextBuffer
- Orchestrates parallel processing of TTS generation and text layout
- Ensures sentences are fully prepared before playback
- Maintains a queue of sentences ready for playback
4. **TTS Generation System**: Prepares audio for sentences.
- Generates audio in the background without blocking UI
- Provides audio duration information for synchronization
- Can be cancelled for fast-forward operations
- Falls back to character count duration calculation when disabled
5. **Typography Processor**: Enhances text presentation quality.
- Applies smart typography (quotes, em-dashes, etc.)
- Handles hyphenation for line breaks
- Preserves special formatting
6. **ParagraphLayout**: Calculates optimal text presentation.
- Computes line breaks using Knuth-Plass algorithm
- Determines word positioning and timing
- Adjusts animation duration to match audio length
7. **AnimationPlayerQueue**: Manages the playback pipeline.
- Maintains a playlist of ready-to-play sentences
- Inserts DOM elements for prepared sentences
- Coordinates CSS-based animations
- Monitors animation completion
- Automatically advances to next sentence
## Process Flow
1. **Preparation Pipeline**:
- Socket client receives text and feeds it to TextBuffer
- TextBuffer identifies complete sentences
- SentenceQueue receives complete sentences
- TTS generation and layout processing happen in parallel
- When both TTS and layout are complete, sentence is marked "ready"
- Ready sentences are added to AnimationPlayerQueue
2. **Playback Pipeline**:
- AnimationPlayerQueue plays the first ready sentence
- DOM elements are inserted and CSS animations begin
- TTS audio plays simultaneously with animations
- AnimationPlayerQueue monitors complete animation duration
- When playback completes, the next ready sentence immediately begins
3. **Fast-Forward Handling**:
- Can interrupt at any stage of the pipeline
- Currently playing animations are immediately completed
- Currently playing audio is faded out and stopped
- Any in-progress sentence preparation is cancelled
- System advances to the next sentence in queue
## Speed Synchronization
1. **Audio-Driven Timing**: Animation speed is determined by audio duration
- TTS audio length dictates animation duration
- Without TTS, duration is calculated from character count and speed setting
2. **Seamless Transitions**: Next sentence begins immediately after current completes
- No gap between sentence playbacks
- Preparation happens during playback of previous sentence
3. **Feedback Loop**: Animation system provides timing data back to preparation pipeline
- Helps optimize future sentence preparation
- Allows runtime adjustment of timing parameters
This architecture separates preparation from playback, creating a buffer of ready content that enables smooth presentation while handling the computational overhead of text processing and TTS generation in the background.
# Text Processing & Layout Architecture
The text processing and layout system transforms raw text input into visually appealing, typographically correct, and elegantly animated content through several specialized components.
## Component Interactions
### Core Components
1. **text-processor.js**: Enhances typography and applies hyphenation
- Entry point for text processing pipeline
- Manages SmartyPants for typographic enhancements
- Controls Hyphenopoly for language-aware hyphenation
- Serves as the central coordinator for text transformation
2. **smartypants.js**: Provides typographic punctuation conversion
- Transforms straight quotes to curly quotes
- Converts hyphens to em-dashes and en-dashes
- Handles ellipses and other typographic niceties
- Operates as a pure function with no dependencies
3. **paragraph-layout.js**: Manages paragraph structure and word metrics
- Breaks text into words and calculates their dimensions
- Manages paragraph-level styling and layout properties
- Prepares text for the line-breaking algorithm
- Connects text-processor output to the layout engine
4. **knuth-plass.js**: Implementation of the optimal line-breaking algorithm
- Calculates aesthetically pleasing line breaks
- Minimizes "raggedness" across paragraph lines
- Implements the core Knuth-Plass algorithm
- Uses linked-list.js for internal data structures
5. **linked-list.js**: Provides data structures for the line-breaking algorithm
- Implements doubly-linked list for efficient node insertion/removal
- Supports the complex data relationships in the Knuth-Plass algorithm
- Pure utility with no direct interaction with other components
6. **hyphenopoly.module.js**: Performs language-aware hyphenation
- Contains language-specific hyphenation patterns
- Provides functions to insert soft hyphens at valid breaking points
- Loaded dynamically when needed by text-processor.js
7. **layout-renderer.js**: Translates calculated layout into DOM elements
- Takes the output from paragraph-layout.js
- Generates DOM structure for the text display
- Creates CSS classes and styles for animations
- Prepares text for display and animation
## Process Flow
1. **Text Input → Typography Enhancement**
```
Raw Text → text-processor.js → smartypants.js → Enhanced Text
```
- Raw text enters the text-processor
- SmartyPants functions transform quotation marks, dashes, etc.
- Typography-enhanced text is produced
2. **Typography-Enhanced Text → Hyphenation**
```
Enhanced Text → text-processor.js → hyphenopoly.module.js → Hyphenated Text
```
- Enhanced text is passed to the hyphenation system
- Language-specific rules determine valid hyphenation points
- Soft hyphens are inserted at appropriate positions
3. **Hyphenated Text → Layout Calculation**
```
Hyphenated Text → paragraph-layout.js → knuth-plass.js → Optimized Layout
```
- Paragraph layout breaks text into words and calculates metrics
- Knuth-Plass algorithm calculates optimal line breaks
- linked-list.js provides the data structures for this process
- An optimized layout structure is produced
4. **Layout → Rendering**
```
Optimized Layout → layout-renderer.js → DOM Elements
```
- Layout renderer converts the abstract layout to concrete DOM
- CSS classes and styles are applied for animation
- Words are positioned according to the calculated layout
5. **Rendering → Animation**
```
DOM Elements → AnimationQueue → Visual Display
```
- The rendered DOM elements are passed to the animation system
- Words are animated according to timing and styling parameters
- Visual presentation occurs synchronized with audio if applicable
## Implementation Dependencies
```
text-processor.js
├── smartypants.js
└── hyphenopoly.module.js
└── [language pattern files]
paragraph-layout.js
├── knuth-plass.js
│ └── linked-list.js
└── [font metrics]
layout-renderer.js
└── [CSS styling]
```
## Integration Points
1. **Text Buffer → Text Processor**
- Text buffer passes complete sentences to the processing pipeline
- Text processor enhances typography and applies hyphenation
2. **Text Processor → Paragraph Layout**
- Enhanced text flows to paragraph layout for structure analysis
- Word metrics and paragraph properties are calculated
3. **Paragraph Layout → Layout Renderer**
- Optimized layout information is passed to the renderer
- Renderer creates DOM elements with appropriate styling
4. **Layout Renderer → Animation Queue**
- Rendered elements are scheduled for animation
- Animation timing is synchronized with TTS if enabled
This architecture ensures typographically beautiful text with optimal line breaks, proper hyphenation, and smooth animation, creating a professional reading experience.
# TTS Integration with Localization
## Architecture Overview
The Text-to-Speech (TTS) system has been refactored to seamlessly integrate with the localization module, ensuring a cohesive user experience across different languages. This integration follows these key architectural principles:
1. **Base TTS Handler Pattern**: All TTS handlers extend a common `TTSHandler` class that inherits from `BaseModule`, ensuring consistent interface and behavior.
2. **Dependency Injection**: TTS handlers access the localization and persistence modules through the dependency system rather than direct global references.
3. **Locale-Aware Voice Selection**: TTS handlers automatically select appropriate voices based on the current locale.
4. **Preference Persistence**: User preferences for TTS settings are stored and retrieved through the persistence manager.
5. **Optional Functionality**: TTS is treated as an optional feature that can be unavailable without breaking the application.
## Core Components
1. **TTSFactory**: Central coordinator for TTS functionality
- Manages initialization of all TTS handlers
- Implements fallback mechanisms when preferred TTS systems are unavailable
- Provides access to the active TTS handler
- Integrates with localization module for language-aware voice selection
- Reports TTS availability to the UI
2. **TTSHandler**: Abstract base class for all TTS handlers
- Defines common interface methods (speak, stop, getVoices, etc.)
- Provides shared utility functions for voice selection and preference handling
- Extends BaseModule for dependency management and event handling
3. **TTS Handlers**: Concrete implementations for different TTS approaches
- **BrowserTTSHandler**: Uses the Web Speech API
- **ApiTTSHandler**: Communicates with a remote TTS API
- **KokoroHandler**: Provides neural TTS via Kokoro.js
4. **OptionsUI**: User interface for TTS configuration
- Allows selection of TTS system (Browser, API, Kokoro)
- Provides voice selection based on available voices for current locale
- Includes controls for volume, rate, and pitch
- Persists user preferences via PersistenceManager
## Localization Integration
1. **Locale-Based Voice Selection**:
- Each TTS handler implements `setupVoiceFromPreferences()` to select voices based on:
- User's explicitly saved voice preference
- Current locale from the localization module
- Fallback to language-matching voice if exact locale match not found
- Default voice (typically English) as final fallback
2. **Voice Filtering**:
- TTS handlers filter available voices to prioritize those matching the current locale
- Voice lists in the UI are sorted to show locale-matching voices first
3. **Preference Persistence**:
- TTS settings (system, voice, volume, rate) are saved per-user
- Settings are automatically applied when the application loads
- Changes in the localization settings trigger voice re-selection
## Initialization Flow
1. **TTSFactory Initialization**:
```
TTSFactory.initialize()
├── Loads user preferences via PersistenceManager
├── Initializes all available TTS handlers
│ ├── KokoroHandler.initialize()
│ └── BrowserTTSHandler.initialize()
├── Selects active TTS handler based on preferences and availability
├── Sets up event listeners for locale changes
└── Dispatches TTS availability event
```
2. **TTS Handler Initialization**:
```
TTSHandler.initialize()
├── Loads system-specific resources
├── Retrieves available voices
├── Gets dependencies (localization, persistenceManager)
└── Sets up voice based on preferences and locale
```
3. **Voice Setup Process**:
```
setupVoiceFromPreferences()
├── Gets user's preferred voice from persistenceManager
├── If preferred voice exists and is available:
│ └── Use preferred voice
├── Otherwise:
│ ├── Get current locale from localization module
│ ├── Find voice matching current locale
│ ├── If no match, find voice matching language part
│ └── If still no match, use default voice
└── Update preference with selected voice
```
## Event Handling
1. **Locale Change Events**:
- When user changes locale in the UI, the localization module emits a 'locale-changed' event
- TTSFactory listens for this event and triggers voice re-selection in the active TTS handler
2. **TTS Preference Events**:
- Changes to TTS settings in the options UI trigger preference updates
- These updates are persisted and immediately applied to the active TTS handler
3. **TTS Availability Events**:
- TTSFactory dispatches 'tts:availability' events to notify the UI about TTS availability
- UI Controller listens for these events and updates the speech toggle button accordingly
## Error Handling and Fallbacks
1. **TTS System Fallbacks**:
- If the preferred TTS system fails to initialize, TTSFactory falls back to the next available system
- Priority order: Kokoro > Browser > None (with None being acceptable)
- API TTS is not used as a fallback as it requires manual configuration
2. **Voice Selection Fallbacks**:
- If preferred voice is unavailable, fall back to locale-matching voice
- If no locale match, fall back to language match
- If no language match, fall back to default (typically English)
3. **TTS Unavailability Handling**:
- If no TTS handlers are available, the system continues to function without TTS
- The speech toggle button is disabled in the UI
- The application remains fully functional for text-only interaction
This architecture ensures that the TTS system seamlessly adapts to the user's language preferences while maintaining a consistent and intuitive user experience across different locales, even when TTS is unavailable.