Fix TTS module initialization and dependency issues. Update module IDs for consistency, improve circular dependency detection, and fix UI Controller event handling.
This commit is contained in:
+340
-1
@@ -256,4 +256,343 @@ The overlay fades away as the first scheduled animation.
|
||||
- Include manual TTS system selection
|
||||
- All settings are persisted via the persistence-manager
|
||||
|
||||
This synchronized approach ensures that text animations and speech work together seamlessly, creating a more immersive storytelling experience while maintaining smooth performance.
|
||||
This synchronized approach ensures that text animations and speech work together seamlessly, creating a more immersive storytelling experience while maintaining smooth performance.
|
||||
|
||||
# Text Output Pipeline Architecture
|
||||
|
||||
The text output pipeline manages the flow of text from server reception to visual display and audio playback, with a focus on performance and synchronization.
|
||||
|
||||
## Core Components
|
||||
|
||||
1. **Socket Client**: Receives raw text fragments from the server.
|
||||
|
||||
2. **TextBuffer**: Accumulates fragments and identifies complete sentences.
|
||||
- Collects all incoming text regardless of fragment size
|
||||
- Identifies and extracts complete sentences
|
||||
- Maintains partial sentences until completion
|
||||
|
||||
3. **SentenceQueue**: Manages the preparation pipeline for sentences.
|
||||
- Receives complete sentences from TextBuffer
|
||||
- Orchestrates parallel processing of TTS generation and text layout
|
||||
- Ensures sentences are fully prepared before playback
|
||||
- Maintains a queue of sentences ready for playback
|
||||
|
||||
4. **TTS Generation System**: Prepares audio for sentences.
|
||||
- Generates audio in the background without blocking UI
|
||||
- Provides audio duration information for synchronization
|
||||
- Can be cancelled for fast-forward operations
|
||||
- Falls back to character count duration calculation when disabled
|
||||
|
||||
5. **Typography Processor**: Enhances text presentation quality.
|
||||
- Applies smart typography (quotes, em-dashes, etc.)
|
||||
- Handles hyphenation for line breaks
|
||||
- Preserves special formatting
|
||||
|
||||
6. **ParagraphLayout**: Calculates optimal text presentation.
|
||||
- Computes line breaks using Knuth-Plass algorithm
|
||||
- Determines word positioning and timing
|
||||
- Adjusts animation duration to match audio length
|
||||
|
||||
7. **AnimationPlayerQueue**: Manages the playback pipeline.
|
||||
- Maintains a playlist of ready-to-play sentences
|
||||
- Inserts DOM elements for prepared sentences
|
||||
- Coordinates CSS-based animations
|
||||
- Monitors animation completion
|
||||
- Automatically advances to next sentence
|
||||
|
||||
## Process Flow
|
||||
|
||||
1. **Preparation Pipeline**:
|
||||
- Socket client receives text and feeds it to TextBuffer
|
||||
- TextBuffer identifies complete sentences
|
||||
- SentenceQueue receives complete sentences
|
||||
- TTS generation and layout processing happen in parallel
|
||||
- When both TTS and layout are complete, sentence is marked "ready"
|
||||
- Ready sentences are added to AnimationPlayerQueue
|
||||
|
||||
2. **Playback Pipeline**:
|
||||
- AnimationPlayerQueue plays the first ready sentence
|
||||
- DOM elements are inserted and CSS animations begin
|
||||
- TTS audio plays simultaneously with animations
|
||||
- AnimationPlayerQueue monitors complete animation duration
|
||||
- When playback completes, the next ready sentence immediately begins
|
||||
|
||||
3. **Fast-Forward Handling**:
|
||||
- Can interrupt at any stage of the pipeline
|
||||
- Currently playing animations are immediately completed
|
||||
- Currently playing audio is faded out and stopped
|
||||
- Any in-progress sentence preparation is cancelled
|
||||
- System advances to the next sentence in queue
|
||||
|
||||
## Speed Synchronization
|
||||
|
||||
1. **Audio-Driven Timing**: Animation speed is determined by audio duration
|
||||
- TTS audio length dictates animation duration
|
||||
- Without TTS, duration is calculated from character count and speed setting
|
||||
|
||||
2. **Seamless Transitions**: Next sentence begins immediately after current completes
|
||||
- No gap between sentence playbacks
|
||||
- Preparation happens during playback of previous sentence
|
||||
|
||||
3. **Feedback Loop**: Animation system provides timing data back to preparation pipeline
|
||||
- Helps optimize future sentence preparation
|
||||
- Allows runtime adjustment of timing parameters
|
||||
|
||||
This architecture separates preparation from playback, creating a buffer of ready content that enables smooth presentation while handling the computational overhead of text processing and TTS generation in the background.
|
||||
|
||||
# Text Processing & Layout Architecture
|
||||
|
||||
The text processing and layout system transforms raw text input into visually appealing, typographically correct, and elegantly animated content through several specialized components.
|
||||
|
||||
## Component Interactions
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **text-processor.js**: Enhances typography and applies hyphenation
|
||||
- Entry point for text processing pipeline
|
||||
- Manages SmartyPants for typographic enhancements
|
||||
- Controls Hyphenopoly for language-aware hyphenation
|
||||
- Serves as the central coordinator for text transformation
|
||||
|
||||
2. **smartypants.js**: Provides typographic punctuation conversion
|
||||
- Transforms straight quotes to curly quotes
|
||||
- Converts hyphens to em-dashes and en-dashes
|
||||
- Handles ellipses and other typographic niceties
|
||||
- Operates as a pure function with no dependencies
|
||||
|
||||
3. **paragraph-layout.js**: Manages paragraph structure and word metrics
|
||||
- Breaks text into words and calculates their dimensions
|
||||
- Manages paragraph-level styling and layout properties
|
||||
- Prepares text for the line-breaking algorithm
|
||||
- Connects text-processor output to the layout engine
|
||||
|
||||
4. **knuth-plass.js**: Implementation of the optimal line-breaking algorithm
|
||||
- Calculates aesthetically pleasing line breaks
|
||||
- Minimizes "raggedness" across paragraph lines
|
||||
- Implements the core Knuth-Plass algorithm
|
||||
- Uses linked-list.js for internal data structures
|
||||
|
||||
5. **linked-list.js**: Provides data structures for the line-breaking algorithm
|
||||
- Implements doubly-linked list for efficient node insertion/removal
|
||||
- Supports the complex data relationships in the Knuth-Plass algorithm
|
||||
- Pure utility with no direct interaction with other components
|
||||
|
||||
6. **hyphenopoly.module.js**: Performs language-aware hyphenation
|
||||
- Contains language-specific hyphenation patterns
|
||||
- Provides functions to insert soft hyphens at valid breaking points
|
||||
- Loaded dynamically when needed by text-processor.js
|
||||
|
||||
7. **layout-renderer.js**: Translates calculated layout into DOM elements
|
||||
- Takes the output from paragraph-layout.js
|
||||
- Generates DOM structure for the text display
|
||||
- Creates CSS classes and styles for animations
|
||||
- Prepares text for display and animation
|
||||
|
||||
## Process Flow
|
||||
|
||||
1. **Text Input → Typography Enhancement**
|
||||
```
|
||||
Raw Text → text-processor.js → smartypants.js → Enhanced Text
|
||||
```
|
||||
- Raw text enters the text-processor
|
||||
- SmartyPants functions transform quotation marks, dashes, etc.
|
||||
- Typography-enhanced text is produced
|
||||
|
||||
2. **Typography-Enhanced Text → Hyphenation**
|
||||
```
|
||||
Enhanced Text → text-processor.js → hyphenopoly.module.js → Hyphenated Text
|
||||
```
|
||||
- Enhanced text is passed to the hyphenation system
|
||||
- Language-specific rules determine valid hyphenation points
|
||||
- Soft hyphens are inserted at appropriate positions
|
||||
|
||||
3. **Hyphenated Text → Layout Calculation**
|
||||
```
|
||||
Hyphenated Text → paragraph-layout.js → knuth-plass.js → Optimized Layout
|
||||
```
|
||||
- Paragraph layout breaks text into words and calculates metrics
|
||||
- Knuth-Plass algorithm calculates optimal line breaks
|
||||
- linked-list.js provides the data structures for this process
|
||||
- An optimized layout structure is produced
|
||||
|
||||
4. **Layout → Rendering**
|
||||
```
|
||||
Optimized Layout → layout-renderer.js → DOM Elements
|
||||
```
|
||||
- Layout renderer converts the abstract layout to concrete DOM
|
||||
- CSS classes and styles are applied for animation
|
||||
- Words are positioned according to the calculated layout
|
||||
|
||||
5. **Rendering → Animation**
|
||||
```
|
||||
DOM Elements → AnimationQueue → Visual Display
|
||||
```
|
||||
- The rendered DOM elements are passed to the animation system
|
||||
- Words are animated according to timing and styling parameters
|
||||
- Visual presentation occurs synchronized with audio if applicable
|
||||
|
||||
## Implementation Dependencies
|
||||
|
||||
```
|
||||
text-processor.js
|
||||
├── smartypants.js
|
||||
└── hyphenopoly.module.js
|
||||
└── [language pattern files]
|
||||
|
||||
paragraph-layout.js
|
||||
├── knuth-plass.js
|
||||
│ └── linked-list.js
|
||||
└── [font metrics]
|
||||
|
||||
layout-renderer.js
|
||||
└── [CSS styling]
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
|
||||
1. **Text Buffer → Text Processor**
|
||||
- Text buffer passes complete sentences to the processing pipeline
|
||||
- Text processor enhances typography and applies hyphenation
|
||||
|
||||
2. **Text Processor → Paragraph Layout**
|
||||
- Enhanced text flows to paragraph layout for structure analysis
|
||||
- Word metrics and paragraph properties are calculated
|
||||
|
||||
3. **Paragraph Layout → Layout Renderer**
|
||||
- Optimized layout information is passed to the renderer
|
||||
- Renderer creates DOM elements with appropriate styling
|
||||
|
||||
4. **Layout Renderer → Animation Queue**
|
||||
- Rendered elements are scheduled for animation
|
||||
- Animation timing is synchronized with TTS if enabled
|
||||
|
||||
This architecture ensures typographically beautiful text with optimal line breaks, proper hyphenation, and smooth animation, creating a professional reading experience.
|
||||
|
||||
# TTS Integration with Localization
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
The Text-to-Speech (TTS) system has been refactored to seamlessly integrate with the localization module, ensuring a cohesive user experience across different languages. This integration follows these key architectural principles:
|
||||
|
||||
1. **Base TTS Handler Pattern**: All TTS handlers extend a common `TTSHandler` class that inherits from `BaseModule`, ensuring consistent interface and behavior.
|
||||
|
||||
2. **Dependency Injection**: TTS handlers access the localization and persistence modules through the dependency system rather than direct global references.
|
||||
|
||||
3. **Locale-Aware Voice Selection**: TTS handlers automatically select appropriate voices based on the current locale.
|
||||
|
||||
4. **Preference Persistence**: User preferences for TTS settings are stored and retrieved through the persistence manager.
|
||||
|
||||
5. **Optional Functionality**: TTS is treated as an optional feature that can be unavailable without breaking the application.
|
||||
|
||||
## Core Components
|
||||
|
||||
1. **TTSFactory**: Central coordinator for TTS functionality
|
||||
- Manages initialization of all TTS handlers
|
||||
- Implements fallback mechanisms when preferred TTS systems are unavailable
|
||||
- Provides access to the active TTS handler
|
||||
- Integrates with localization module for language-aware voice selection
|
||||
- Reports TTS availability to the UI
|
||||
|
||||
2. **TTSHandler**: Abstract base class for all TTS handlers
|
||||
- Defines common interface methods (speak, stop, getVoices, etc.)
|
||||
- Provides shared utility functions for voice selection and preference handling
|
||||
- Extends BaseModule for dependency management and event handling
|
||||
|
||||
3. **TTS Handlers**: Concrete implementations for different TTS approaches
|
||||
- **BrowserTTSHandler**: Uses the Web Speech API
|
||||
- **ApiTTSHandler**: Communicates with a remote TTS API
|
||||
- **KokoroHandler**: Provides neural TTS via Kokoro.js
|
||||
|
||||
4. **OptionsUI**: User interface for TTS configuration
|
||||
- Allows selection of TTS system (Browser, API, Kokoro)
|
||||
- Provides voice selection based on available voices for current locale
|
||||
- Includes controls for volume, rate, and pitch
|
||||
- Persists user preferences via PersistenceManager
|
||||
|
||||
## Localization Integration
|
||||
|
||||
1. **Locale-Based Voice Selection**:
|
||||
- Each TTS handler implements `setupVoiceFromPreferences()` to select voices based on:
|
||||
- User's explicitly saved voice preference
|
||||
- Current locale from the localization module
|
||||
- Fallback to language-matching voice if exact locale match not found
|
||||
- Default voice (typically English) as final fallback
|
||||
|
||||
2. **Voice Filtering**:
|
||||
- TTS handlers filter available voices to prioritize those matching the current locale
|
||||
- Voice lists in the UI are sorted to show locale-matching voices first
|
||||
|
||||
3. **Preference Persistence**:
|
||||
- TTS settings (system, voice, volume, rate) are saved per-user
|
||||
- Settings are automatically applied when the application loads
|
||||
- Changes in the localization settings trigger voice re-selection
|
||||
|
||||
## Initialization Flow
|
||||
|
||||
1. **TTSFactory Initialization**:
|
||||
```
|
||||
TTSFactory.initialize()
|
||||
├── Loads user preferences via PersistenceManager
|
||||
├── Initializes all available TTS handlers
|
||||
│ ├── KokoroHandler.initialize()
|
||||
│ └── BrowserTTSHandler.initialize()
|
||||
├── Selects active TTS handler based on preferences and availability
|
||||
├── Sets up event listeners for locale changes
|
||||
└── Dispatches TTS availability event
|
||||
```
|
||||
|
||||
2. **TTS Handler Initialization**:
|
||||
```
|
||||
TTSHandler.initialize()
|
||||
├── Loads system-specific resources
|
||||
├── Retrieves available voices
|
||||
├── Gets dependencies (localization, persistenceManager)
|
||||
└── Sets up voice based on preferences and locale
|
||||
```
|
||||
|
||||
3. **Voice Setup Process**:
|
||||
```
|
||||
setupVoiceFromPreferences()
|
||||
├── Gets user's preferred voice from persistenceManager
|
||||
├── If preferred voice exists and is available:
|
||||
│ └── Use preferred voice
|
||||
├── Otherwise:
|
||||
│ ├── Get current locale from localization module
|
||||
│ ├── Find voice matching current locale
|
||||
│ ├── If no match, find voice matching language part
|
||||
│ └── If still no match, use default voice
|
||||
└── Update preference with selected voice
|
||||
```
|
||||
|
||||
## Event Handling
|
||||
|
||||
1. **Locale Change Events**:
|
||||
- When user changes locale in the UI, the localization module emits a 'locale-changed' event
|
||||
- TTSFactory listens for this event and triggers voice re-selection in the active TTS handler
|
||||
|
||||
2. **TTS Preference Events**:
|
||||
- Changes to TTS settings in the options UI trigger preference updates
|
||||
- These updates are persisted and immediately applied to the active TTS handler
|
||||
|
||||
3. **TTS Availability Events**:
|
||||
- TTSFactory dispatches 'tts:availability' events to notify the UI about TTS availability
|
||||
- UI Controller listens for these events and updates the speech toggle button accordingly
|
||||
|
||||
## Error Handling and Fallbacks
|
||||
|
||||
1. **TTS System Fallbacks**:
|
||||
- If the preferred TTS system fails to initialize, TTSFactory falls back to the next available system
|
||||
- Priority order: Kokoro > Browser > None (with None being acceptable)
|
||||
- API TTS is not used as a fallback as it requires manual configuration
|
||||
|
||||
2. **Voice Selection Fallbacks**:
|
||||
- If preferred voice is unavailable, fall back to locale-matching voice
|
||||
- If no locale match, fall back to language match
|
||||
- If no language match, fall back to default (typically English)
|
||||
|
||||
3. **TTS Unavailability Handling**:
|
||||
- If no TTS handlers are available, the system continues to function without TTS
|
||||
- The speech toggle button is disabled in the UI
|
||||
- The application remains fully functional for text-only interaction
|
||||
|
||||
This architecture ensures that the TTS system seamlessly adapts to the user's language preferences while maintaining a consistent and intuitive user experience across different locales, even when TTS is unavailable.
|
||||
Reference in New Issue
Block a user