📺 ChatterBox SRT Voice TTS:
ChatterBoxSRTVoiceTTS is a sophisticated node designed to convert text into speech with a focus on generating high-quality audio outputs. This node is particularly beneficial for AI artists and developers who require dynamic and expressive voice synthesis for their projects. It leverages advanced text-to-speech (TTS) technology to produce audio that can mimic various characters and languages, making it ideal for creating immersive audio experiences. The node supports features such as pause tags, which allow for more natural speech patterns by inserting pauses where necessary. Additionally, it offers customization options like exaggeration and temperature settings to adjust the expressiveness and variability of the generated speech. By utilizing this node, you can achieve a more engaging and realistic audio output, enhancing the overall quality of your AI-driven applications.
📺 ChatterBox SRT Voice TTS Input Parameters:
text
The text parameter is the primary input for the node, representing the content that you wish to convert into speech. This parameter accepts a string of text, which can be in any supported language. The length of the text can impact processing time, especially if chunking is enabled for longer texts. There are no explicit minimum or maximum values, but it's advisable to keep the text within a reasonable length for optimal performance.
audio_prompt
The audio_prompt parameter allows you to provide a reference audio file that the TTS system can use to match the voice characteristics. This can be particularly useful if you want the generated speech to mimic a specific voice or style. The parameter accepts an audio file path or a reference to an audio tensor.
exaggeration
The exaggeration parameter controls the expressiveness of the generated speech. It is a float value that adjusts how much the speech deviates from a neutral tone. Higher values result in more exaggerated speech, which can be useful for creating dramatic or emotional audio outputs. The default value is typically set to a moderate level to balance naturalness and expressiveness.
temperature
The temperature parameter influences the variability and creativity of the speech synthesis. It is a float value where lower values result in more deterministic and stable outputs, while higher values introduce more randomness and variation. This parameter is crucial for achieving the desired level of spontaneity in the generated speech.
cfg_weight
The cfg_weight parameter is a float value that adjusts the influence of the conditioning factors on the TTS model. It helps in fine-tuning the balance between the input text and the reference audio characteristics. This parameter is essential for achieving a coherent and contextually appropriate audio output.
language
The language parameter specifies the language in which the text should be synthesized. It accepts a string value representing the language code, such as "English" or "Spanish". This parameter ensures that the TTS system uses the correct phonetic and linguistic rules for the specified language.
enable_pause_tags
The enable_pause_tags parameter is a boolean that determines whether pause tags should be used in the speech synthesis. When enabled, the system inserts pauses in the speech to create more natural and human-like audio. This is particularly useful for longer texts or when simulating conversational speech.
character
The character parameter allows you to specify the character or voice persona that should be used for the speech synthesis. It accepts a string value representing the character's name or role, such as "narrator" or "villain". This parameter is crucial for projects that require distinct voice identities.
seed
The seed parameter is an integer that sets the random seed for the TTS generation process. By providing a specific seed value, you can ensure that the generated speech is reproducible and consistent across different runs. This is useful for debugging or when you need to maintain consistency in audio outputs.
enable_cache
The enable_cache parameter is a boolean that determines whether caching should be used to store intermediate audio results. Enabling caching can significantly speed up the processing time for repeated or similar text inputs, as it avoids redundant computations.
crash_protection_template
The crash_protection_template parameter is a string that provides a template for handling potential crashes during the TTS process. It typically includes placeholder text that can be used to fill in segments of the text that might cause issues, ensuring that the system can recover gracefully from errors.
stable_audio_component
The stable_audio_component parameter allows you to specify a stable audio component that can be used to enhance the consistency of the generated speech. This parameter is optional and can be used to maintain a uniform audio quality across different segments.
📺 ChatterBox SRT Voice TTS Output Parameters:
audio_output
The audio_output parameter is the primary output of the node, representing the synthesized speech in the form of an audio tensor. This output is crucial for applications that require high-quality audio, as it provides the final speech product that can be used in various multimedia projects. The audio tensor can be further processed or directly integrated into your applications.
📺 ChatterBox SRT Voice TTS Usage Tips:
- To achieve the most natural-sounding speech, experiment with the
exaggerationandtemperatureparameters to find the right balance for your specific use case. - Utilize the
enable_pause_tagsfeature to insert natural pauses in the speech, especially for longer texts or when simulating dialogue. - If you need consistent audio outputs across different runs, make sure to set a specific
seedvalue. - Consider enabling
enable_cacheto improve processing speed for repeated text inputs, especially in scenarios where performance is critical.
📺 ChatterBox SRT Voice TTS Common Errors and Solutions:
"Text input too long"
- Explanation: This error occurs when the input text exceeds the maximum character limit for a single chunk.
- Solution: Enable chunking by setting
enable_chunkingto true and adjustmax_chars_per_chunkto a suitable value to split the text into manageable segments.
"Unsupported language code"
- Explanation: The specified language code is not supported by the TTS system.
- Solution: Verify that the
languageparameter is set to a valid and supported language code, such as "English" or "Spanish".
"Audio prompt not found"
- Explanation: The specified audio prompt file could not be located or accessed.
- Solution: Ensure that the
audio_promptparameter points to a valid file path or audio tensor and that the file is accessible.
"Character voice not available"
- Explanation: The specified character voice is not available in the current TTS model.
- Solution: Check the available character voices and ensure that the
characterparameter is set to a valid option.
