ComfyUI > Nodes > ComfyUI-QwenTTS > Custom Voice (QwenTTS) Advanced

ComfyUI Node: Custom Voice (QwenTTS) Advanced

Class Name

AILab_Qwen3TTSCustomVoice_Advanced

Category
🧪AILab/🎙️QwenTTS
Author
1038lab (Account age: 0days)
Extension
ComfyUI-QwenTTS
Latest Updated
2026-03-18
Github Stars
0.2K

How to Install ComfyUI-QwenTTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-QwenTTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-QwenTTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Custom Voice (QwenTTS) Advanced Description

Advanced node for creating custom voice models with QwenTTS, enabling nuanced voice synthesis.

Custom Voice (QwenTTS) Advanced:

The AILab_Qwen3TTSCustomVoice_Advanced node is designed to provide advanced capabilities for creating custom voice models using the QwenTTS framework. This node allows you to generate highly personalized and nuanced voice outputs by leveraging advanced text-to-speech synthesis techniques. It is particularly beneficial for AI artists and developers who wish to create unique voice profiles that can be tailored to specific artistic or functional requirements. The node's advanced features enable fine-tuning of voice characteristics, offering a high degree of control over the final audio output. This makes it an essential tool for projects that demand high-quality, customized voice synthesis.

Custom Voice (QwenTTS) Advanced Input Parameters:

target_text

The target_text parameter specifies the text that you want to convert into speech. This is the primary input for the text-to-speech synthesis process. The quality and clarity of the generated voice will depend on the complexity and length of the text provided. There are no strict minimum or maximum values, but shorter texts may yield more precise results.

model_size

The model_size parameter determines the size of the model used for voice synthesis. Larger models typically provide better quality and more natural-sounding voices but require more computational resources. Options may include small, medium, and large, with the default being medium.

device

The device parameter specifies the hardware on which the model will run, such as "cpu" or "gpu". Using a GPU can significantly speed up the processing time, especially for larger models.

precision

The precision parameter defines the numerical precision used during computation, such as "fp32" or "bf16". Higher precision can improve the quality of the output but may require more memory and processing power.

language

The language parameter sets the language of the input text and the desired output speech. This ensures that the voice synthesis is optimized for the specific phonetic and linguistic characteristics of the chosen language.

reference_audio

The reference_audio parameter allows you to provide an audio sample that the model can use as a reference for voice characteristics. This can help in creating a voice output that closely matches the tone and style of the reference.

reference_text

The reference_text parameter is used in conjunction with reference_audio to provide context for the reference audio. This helps the model better understand the nuances of the reference voice.

x_vector_only

The x_vector_only parameter, when set to true, restricts the model to use only the x-vector for voice synthesis, which can be useful for certain types of voice cloning tasks.

voice

The voice parameter allows you to specify a pre-existing voice model to use as a base for synthesis. This can be useful for maintaining consistency across different outputs.

max_new_tokens

The max_new_tokens parameter sets the maximum number of tokens that can be generated in the output. This controls the length of the synthesized speech.

do_sample

The do_sample parameter, when enabled, allows the model to sample from the distribution of possible outputs, which can introduce variability and creativity in the generated speech.

top_p

The top_p parameter is used in nucleus sampling to control the diversity of the output. A lower value results in more conservative outputs, while a higher value allows for more variation.

top_k

The top_k parameter limits the number of highest probability vocabulary tokens to consider during sampling, which can help in generating more focused and relevant speech outputs.

temperature

The temperature parameter controls the randomness of the sampling process. A lower temperature results in more deterministic outputs, while a higher temperature increases variability.

repetition_penalty

The repetition_penalty parameter discourages the model from repeating the same phrases or words, ensuring more varied and natural-sounding speech.

attention

The attention parameter specifies the attention mechanism to use, which can affect the quality and coherence of the generated speech.

unload_models

The unload_models parameter, when set to true, unloads the models from memory after processing, which can be useful for managing memory usage in resource-constrained environments.

seed

The seed parameter sets the random seed for the generation process, allowing for reproducibility of results. A value of -1 indicates that no specific seed is set.

Custom Voice (QwenTTS) Advanced Output Parameters:

audio

The audio output parameter provides the synthesized speech audio as a result of the text-to-speech process. This audio output is the final product of the node's operation, reflecting all the input parameters and settings applied during synthesis. It is typically in a standard audio format that can be easily played back or further processed.

Custom Voice (QwenTTS) Advanced Usage Tips:

  • Experiment with different model_size and precision settings to find the best balance between quality and performance for your specific use case.
  • Use reference_audio and reference_text to closely match the voice characteristics of a specific speaker or style.
  • Adjust temperature, top_p, and top_k to control the creativity and variability of the generated speech, especially for artistic projects.

Custom Voice (QwenTTS) Advanced Common Errors and Solutions:

"Model loading failed"

  • Explanation: This error occurs when the specified model cannot be loaded, possibly due to incorrect model_size or insufficient resources.
  • Solution: Verify that the model_size is correct and ensure that your system has enough resources to load the model. Consider using a smaller model if necessary.

"Invalid input text"

  • Explanation: This error indicates that the target_text provided is not valid, possibly due to unsupported characters or formatting issues.
  • Solution: Check the target_text for any unsupported characters or formatting issues and correct them before retrying.

"Device not supported"

  • Explanation: This error occurs when the specified device is not available or supported by the system.
  • Solution: Ensure that the specified device is correctly set to either "cpu" or "gpu" and that the necessary hardware is available.

Custom Voice (QwenTTS) Advanced Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-QwenTTS
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.