Spark TTS Run

Generate audio waveforms from text inputs using TTS technology with customizable parameters for natural-sounding speech.

Spark TTS Run:

SparkTTSRun is a node designed to facilitate the generation of audio waveforms from text inputs using text-to-speech (TTS) technology. This node leverages advanced machine learning models to convert written text into spoken words, providing a seamless and efficient way to produce high-quality audio outputs. The primary goal of SparkTTSRun is to enable users to create natural-sounding speech from text, with customizable parameters that allow for fine-tuning of the speech characteristics such as pitch, speed, and sampling methods. This node is particularly beneficial for AI artists and developers who wish to integrate TTS capabilities into their projects, offering a robust solution for generating audio content dynamically.

Spark TTS Run Input Parameters:

text

The text parameter is the core input for the SparkTTSRun node, representing the written content that you wish to convert into speech. This parameter directly influences the audio output, as it determines the words and phrases that will be spoken. There are no specific constraints on the length or content of the text, but longer texts may result in longer processing times.

gender

The gender parameter allows you to specify the gender of the voice that will be used for the TTS output. This can be set to either "male" or "female," depending on the desired voice characteristics. The choice of gender can impact the tone and style of the generated speech, providing flexibility in tailoring the audio to suit different contexts or preferences.

top_k

The top_k parameter is a float value that controls the number of highest probability vocabulary tokens to keep for top-k filtering during the sampling process. It helps in managing the diversity of the generated speech by limiting the selection to the top-k most likely options. The default value is not specified, but it typically ranges from 0 to a positive integer, with higher values allowing for more variation in the output.

top_p

The top_p parameter, also known as nucleus sampling, is a float value that determines the cumulative probability threshold for token selection. It ensures that only the most probable tokens, whose cumulative probability is below the specified threshold, are considered during sampling. The default value is 0.95, with a range from 0 to 1, where lower values result in more deterministic outputs and higher values increase variability.

temperature

The temperature parameter is a float value that influences the randomness of the sampling process. A lower temperature results in more deterministic outputs, while a higher temperature introduces more randomness and creativity in the generated speech. The specific default value is not provided, but it typically ranges from 0 to 1.

max_new_tokens

The max_new_tokens parameter is an integer that sets the maximum number of tokens to generate in the output. This parameter helps control the length of the generated speech, with a default value of 3000 and a minimum of 500. Adjusting this parameter allows you to manage the verbosity of the TTS output.

do_sample

The do_sample parameter is a boolean that determines whether sampling should be used during the generation process. When set to True, the node will use sampling to generate more varied and creative outputs. The default value is True, which encourages diversity in the generated speech.

unload_model

The unload_model parameter is a boolean that specifies whether the TTS model should be unloaded from memory after processing. This is useful for managing system resources, especially when working with large models. The default value is True, which helps free up memory after the node has completed its task.

seed

The seed parameter is an integer used to initialize the random number generator for reproducibility. By setting a specific seed value, you can ensure that the same input will produce the same output across different runs. The default value is 0, with a range from 0 to a large positive integer.

Spark TTS Run Output Parameters:

waveform

The waveform output parameter is a tensor representing the audio waveform generated from the input text. This tensor contains the raw audio data that can be played back or further processed. The waveform is crucial for understanding the quality and characteristics of the generated speech, as it directly reflects the TTS model's performance.

sample_rate

The sample_rate output parameter is an integer that indicates the number of samples per second in the generated audio waveform. For SparkTTSRun, the sample rate is set to 16000 Hz, which is a standard rate for high-quality audio. This parameter is important for ensuring compatibility with audio playback systems and for maintaining the fidelity of the generated speech.

Spark TTS Run Usage Tips:

To achieve more natural-sounding speech, experiment with the temperature and top_p parameters to find a balance between randomness and determinism that suits your needs.
Use the gender parameter to match the voice characteristics with the intended audience or context, enhancing the overall impact of the generated speech.
If you encounter memory issues, consider setting unload_model to True to free up resources after each run, especially when processing multiple requests.

Spark TTS Run Common Errors and Solutions:

"CUDA out of memory"

Explanation: This error occurs when the GPU does not have enough memory to load the TTS model and process the input.
Solution: Try reducing the size of the input text or adjust the max_new_tokens parameter to a lower value. Additionally, ensure that unload_model is set to True to free up memory after each run.

"Invalid input text"

Explanation: This error may arise if the input text is not properly formatted or contains unsupported characters.
Solution: Verify that the input text is correctly formatted and does not include any special characters that might not be supported by the tokenizer.

"Model not loaded"

Explanation: This error indicates that the TTS model has not been successfully loaded into memory.
Solution: Ensure that the model files are correctly installed and accessible. If the problem persists, try restarting the application or reloading the model.

ComfyUI Node: Spark TTS Run

SparkTTSRun

How to Install ComfyUI_SparkTTS

Spark TTS Run Description