Spark TTS Run:
SparkTTSRun is a node designed to facilitate the generation of audio waveforms from text inputs using text-to-speech (TTS) technology. This node leverages advanced machine learning models to convert written text into spoken words, providing a seamless and efficient way to produce high-quality audio outputs. The primary goal of SparkTTSRun is to enable users to create natural-sounding speech from text, with customizable parameters that allow for fine-tuning of the speech characteristics such as pitch, speed, and sampling methods. This node is particularly beneficial for AI artists and developers who wish to integrate TTS capabilities into their projects, offering a robust solution for generating audio content dynamically.
Spark TTS Run Input Parameters:
text
The text parameter is the core input for the SparkTTSRun node, representing the written content that you wish to convert into speech. This parameter directly influences the audio output, as it determines the words and phrases that will be spoken. There are no specific constraints on the length or content of the text, but longer texts may result in longer processing times.
gender
The gender parameter allows you to specify the gender of the voice that will be used for the TTS output. This can be set to either "male" or "female," depending on the desired voice characteristics. The choice of gender can impact the tone and style of the generated speech, providing flexibility in tailoring the audio to suit different contexts or preferences.
top_k
The top_k parameter is a float value that controls the number of highest probability vocabulary tokens to keep for top-k filtering during the sampling process. It helps in managing the diversity of the generated speech by limiting the selection to the top-k most likely options. The default value is not specified, but it typically ranges from 0 to a positive integer, with higher values allowing for more variation in the output.
top_p
The top_p parameter, also known as nucleus sampling, is a float value that determines the cumulative probability threshold for token selection. It ensures that only the most probable tokens, whose cumulative probability is below the specified threshold, are considered during sampling. The default value is 0.95, with a range from 0 to 1, where lower values result in more deterministic outputs and higher values increase variability.
temperature
The temperature parameter is a float value that influences the randomness of the sampling process. A lower temperature results in more deterministic outputs, while a higher temperature introduces more randomness and creativity in the generated speech. The specific default value is not provided, but it typically ranges from 0 to 1.
max_new_tokens
The max_new_tokens parameter is an integer that sets the maximum number of tokens to generate in the output. This parameter helps control the length of the generated speech, with a default value of 3000 and a minimum of 500. Adjusting this parameter allows you to manage the verbosity of the TTS output.
do_sample
The do_sample parameter is a boolean that determines whether sampling should be used during the generation process. When set to True, the node will use sampling to generate more varied and creative outputs. The default value is True, which encourages diversity in the generated speech.
unload_model
The unload_model parameter is a boolean that specifies whether the TTS model should be unloaded from memory after processing. This is useful for managing system resources, especially when working with large models. The default value is True, which helps free up memory after the node has completed its task.
seed
The seed parameter is an integer used to initialize the random number generator for reproducibility. By setting a specific seed value, you can ensure that the same input will produce the same output across different runs. The default value is 0, with a range from 0 to a large positive integer.
Spark TTS Run Output Parameters:
waveform
The waveform output parameter is a tensor representing the audio waveform generated from the input text. This tensor contains the raw audio data that can be played back or further processed. The waveform is crucial for understanding the quality and characteristics of the generated speech, as it directly reflects the TTS model's performance.
sample_rate
The sample_rate output parameter is an integer that indicates the number of samples per second in the generated audio waveform. For SparkTTSRun, the sample rate is set to 16000 Hz, which is a standard rate for high-quality audio. This parameter is important for ensuring compatibility with audio playback systems and for maintaining the fidelity of the generated speech.
Spark TTS Run Usage Tips:
- To achieve more natural-sounding speech, experiment with the
temperatureandtop_pparameters to find a balance between randomness and determinism that suits your needs. - Use the
genderparameter to match the voice characteristics with the intended audience or context, enhancing the overall impact of the generated speech. - If you encounter memory issues, consider setting
unload_modeltoTrueto free up resources after each run, especially when processing multiple requests.
Spark TTS Run Common Errors and Solutions:
"CUDA out of memory"
- Explanation: This error occurs when the GPU does not have enough memory to load the TTS model and process the input.
- Solution: Try reducing the size of the input text or adjust the
max_new_tokensparameter to a lower value. Additionally, ensure thatunload_modelis set toTrueto free up memory after each run.
"Invalid input text"
- Explanation: This error may arise if the input text is not properly formatted or contains unsupported characters.
- Solution: Verify that the input text is correctly formatted and does not include any special characters that might not be supported by the tokenizer.
"Model not loaded"
- Explanation: This error indicates that the TTS model has not been successfully loaded into memory.
- Solution: Ensure that the model files are correctly installed and accessible. If the problem persists, try restarting the application or reloading the model.
