Custom Voice (QwenTTS) Advanced:
The AILab_Qwen3TTSCustomVoice_Advanced node is designed to provide advanced capabilities for creating custom voice models using the QwenTTS framework. This node allows you to generate highly personalized and nuanced voice outputs by leveraging advanced text-to-speech synthesis techniques. It is particularly beneficial for AI artists and developers who wish to create unique voice profiles that can be tailored to specific artistic or functional requirements. The node's advanced features enable fine-tuning of voice characteristics, offering a high degree of control over the final audio output. This makes it an essential tool for projects that demand high-quality, customized voice synthesis.
Custom Voice (QwenTTS) Advanced Input Parameters:
target_text
The target_text parameter specifies the text that you want to convert into speech. This is the primary input for the text-to-speech synthesis process. The quality and clarity of the generated voice will depend on the complexity and length of the text provided. There are no strict minimum or maximum values, but shorter texts may yield more precise results.
model_size
The model_size parameter determines the size of the model used for voice synthesis. Larger models typically provide better quality and more natural-sounding voices but require more computational resources. Options may include small, medium, and large, with the default being medium.
device
The device parameter specifies the hardware on which the model will run, such as "cpu" or "gpu". Using a GPU can significantly speed up the processing time, especially for larger models.
precision
The precision parameter defines the numerical precision used during computation, such as "fp32" or "bf16". Higher precision can improve the quality of the output but may require more memory and processing power.
language
The language parameter sets the language of the input text and the desired output speech. This ensures that the voice synthesis is optimized for the specific phonetic and linguistic characteristics of the chosen language.
reference_audio
The reference_audio parameter allows you to provide an audio sample that the model can use as a reference for voice characteristics. This can help in creating a voice output that closely matches the tone and style of the reference.
reference_text
The reference_text parameter is used in conjunction with reference_audio to provide context for the reference audio. This helps the model better understand the nuances of the reference voice.
x_vector_only
The x_vector_only parameter, when set to true, restricts the model to use only the x-vector for voice synthesis, which can be useful for certain types of voice cloning tasks.
voice
The voice parameter allows you to specify a pre-existing voice model to use as a base for synthesis. This can be useful for maintaining consistency across different outputs.
max_new_tokens
The max_new_tokens parameter sets the maximum number of tokens that can be generated in the output. This controls the length of the synthesized speech.
do_sample
The do_sample parameter, when enabled, allows the model to sample from the distribution of possible outputs, which can introduce variability and creativity in the generated speech.
top_p
The top_p parameter is used in nucleus sampling to control the diversity of the output. A lower value results in more conservative outputs, while a higher value allows for more variation.
top_k
The top_k parameter limits the number of highest probability vocabulary tokens to consider during sampling, which can help in generating more focused and relevant speech outputs.
temperature
The temperature parameter controls the randomness of the sampling process. A lower temperature results in more deterministic outputs, while a higher temperature increases variability.
repetition_penalty
The repetition_penalty parameter discourages the model from repeating the same phrases or words, ensuring more varied and natural-sounding speech.
attention
The attention parameter specifies the attention mechanism to use, which can affect the quality and coherence of the generated speech.
unload_models
The unload_models parameter, when set to true, unloads the models from memory after processing, which can be useful for managing memory usage in resource-constrained environments.
seed
The seed parameter sets the random seed for the generation process, allowing for reproducibility of results. A value of -1 indicates that no specific seed is set.
Custom Voice (QwenTTS) Advanced Output Parameters:
audio
The audio output parameter provides the synthesized speech audio as a result of the text-to-speech process. This audio output is the final product of the node's operation, reflecting all the input parameters and settings applied during synthesis. It is typically in a standard audio format that can be easily played back or further processed.
Custom Voice (QwenTTS) Advanced Usage Tips:
- Experiment with different
model_sizeandprecisionsettings to find the best balance between quality and performance for your specific use case. - Use
reference_audioandreference_textto closely match the voice characteristics of a specific speaker or style. - Adjust
temperature,top_p, andtop_kto control the creativity and variability of the generated speech, especially for artistic projects.
Custom Voice (QwenTTS) Advanced Common Errors and Solutions:
"Model loading failed"
- Explanation: This error occurs when the specified model cannot be loaded, possibly due to incorrect
model_sizeor insufficient resources. - Solution: Verify that the
model_sizeis correct and ensure that your system has enough resources to load the model. Consider using a smaller model if necessary.
"Invalid input text"
- Explanation: This error indicates that the
target_textprovided is not valid, possibly due to unsupported characters or formatting issues. - Solution: Check the
target_textfor any unsupported characters or formatting issues and correct them before retrying.
"Device not supported"
- Explanation: This error occurs when the specified
deviceis not available or supported by the system. - Solution: Ensure that the specified
deviceis correctly set to either "cpu" or "gpu" and that the necessary hardware is available.
