Chatterbox TTS 📢

ChatterboxTTS converts text to natural speech using advanced models for dynamic voice synthesis.

Chatterbox TTS 📢:

ChatterboxTTS is a sophisticated text-to-speech (TTS) node designed to convert written text into natural-sounding speech. It leverages advanced machine learning models to generate high-quality audio outputs that can mimic human speech with remarkable accuracy. The node is particularly beneficial for applications requiring dynamic voice synthesis, such as virtual assistants, audiobooks, and interactive media. By utilizing a combination of voice encoding, text tokenization, and speech generation techniques, ChatterboxTTS can produce speech that reflects various emotional tones and speaker characteristics. This flexibility allows users to create personalized and contextually appropriate audio content. The node also includes features like conditional generation and watermarking to ensure the integrity and authenticity of the generated audio.

Chatterbox TTS 📢 Input Parameters:

text

The text parameter is the primary input for the ChatterboxTTS node, representing the written content you wish to convert into speech. This parameter accepts a string of text, which the node processes to generate corresponding audio. The quality and clarity of the output speech are directly influenced by the input text, so it's important to ensure that the text is well-structured and free of errors. There are no strict minimum or maximum length constraints, but excessively long texts may require more processing time.

repetition_penalty

The repetition_penalty parameter helps control the tendency of the model to repeat phrases or words in the generated speech. A value greater than 1.0 discourages repetition, while a value less than 1.0 encourages it. The default value is 1.2, which generally provides a good balance for most applications.

min_p

The min_p parameter sets a threshold for the probability of selecting tokens during speech generation. It helps filter out less likely token sequences, ensuring more coherent and natural-sounding speech. The default value is 0.05, with a range typically between 0.0 and 1.0.

top_p

The top_p parameter, also known as nucleus sampling, determines the cumulative probability threshold for token selection. It allows the model to consider only the most probable tokens, enhancing the quality of the generated speech. The default value is 1.0, which means all tokens are considered.

audio_prompt_path

The audio_prompt_path parameter allows you to specify a path to an audio file that serves as a reference for the desired voice characteristics in the generated speech. This can be particularly useful for creating speech that matches a specific speaker's voice or style. If not provided, the node uses default voice settings.

exaggeration

The exaggeration parameter adjusts the emotional intensity of the generated speech. A higher value results in more pronounced emotional expression, while a lower value produces more neutral speech. The default value is 0.5, providing a balanced emotional tone.

cfg_weight

The cfg_weight parameter influences the strength of the conditional generation features, allowing you to control how closely the generated speech adheres to the specified conditions. The default value is 0.5, offering a moderate level of adherence.

temperature

The temperature parameter controls the randomness of the speech generation process. A higher temperature value results in more varied and creative outputs, while a lower value produces more deterministic and consistent speech. The default value is 0.8.

pbar

The pbar parameter is used to display a progress bar during the speech generation process. This can be helpful for monitoring the progress of longer text inputs. It is optional and typically used in interactive environments.

max_new_tokens

The max_new_tokens parameter sets the maximum number of tokens that can be generated for the speech output. This limits the length of the generated audio, ensuring it remains within a manageable duration. The default value is 1000 tokens.

flow_cfg_scale

The flow_cfg_scale parameter adjusts the scale of the flow configuration used in the speech generation process. It influences the smoothness and coherence of the generated audio. The default value is 0.7, providing a good balance for most use cases.

Chatterbox TTS 📢 Output Parameters:

waveform

The waveform output parameter contains the generated audio data in the form of a waveform tensor. This tensor represents the synthesized speech corresponding to the input text, ready for playback or further processing. The waveform is typically accompanied by a sample rate, ensuring compatibility with standard audio playback systems.

sample_rate

The sample_rate output parameter specifies the sample rate of the generated audio waveform. This value is crucial for ensuring that the audio is played back at the correct speed and quality. The sample rate is typically set to match the capabilities of the audio playback system or the requirements of the application.

Chatterbox TTS 📢 Usage Tips:

To achieve the best results, ensure that your input text is clear and well-structured, as this directly impacts the quality of the generated speech.
Experiment with the temperature and top_p parameters to find the right balance between creativity and coherence in the generated speech.
Use the audio_prompt_path parameter to match the voice characteristics of a specific speaker, enhancing the personalization of the generated audio.

Chatterbox TTS 📢 Common Errors and Solutions:

Error during TTS generation

Explanation: This error occurs when there is an issue during the text-to-speech generation process, possibly due to incorrect input parameters or model configuration.
Solution: Check the input parameters for any inconsistencies or errors. Ensure that the audio prompt path, if used, is valid and accessible. Review the model configuration and ensure that all necessary files and dependencies are correctly loaded.

Please `prepare_conditionals` first or specify `audio_prompt_path`

Explanation: This error indicates that the node requires conditional preparation or an audio prompt path to proceed with the speech generation.
Solution: Either prepare the necessary conditionals using the appropriate method or provide a valid audio prompt path to guide the speech generation process.

ComfyUI Node: Chatterbox TTS 📢

ChatterboxTTS

How to Install ComfyUI-Chatterbox

Chatterbox TTS 📢 Description

Chatterbox TTS 📢:

Chatterbox TTS 📢 Input Parameters:

text

repetition_penalty

min_p

top_p

audio_prompt_path

exaggeration

cfg_weight

temperature

pbar

max_new_tokens

flow_cfg_scale

Chatterbox TTS 📢 Output Parameters:

waveform

sample_rate

Chatterbox TTS 📢 Usage Tips:

Chatterbox TTS 📢 Common Errors and Solutions:

Error during TTS generation

Please `prepare_conditionals` first or specify `audio_prompt_path`

Chatterbox TTS 📢 Related Nodes

ComfyUI Node: Chatterbox TTS 📢

ChatterboxTTS

How to Install ComfyUI-Chatterbox

Chatterbox TTS 📢 Description

Chatterbox TTS 📢:

Chatterbox TTS 📢 Input Parameters:

text

repetition_penalty

min_p

top_p

audio_prompt_path

exaggeration

cfg_weight

temperature

pbar

max_new_tokens

flow_cfg_scale

Chatterbox TTS 📢 Output Parameters:

waveform

sample_rate

Chatterbox TTS 📢 Usage Tips:

Chatterbox TTS 📢 Common Errors and Solutions:

Error during TTS generation

Please prepare_conditionals first or specify audio_prompt_path

Chatterbox TTS 📢 Related Nodes

Please `prepare_conditionals` first or specify `audio_prompt_path`