ComfyUI > Nodes > ComfyUI-QwenTTS > Voice Clone (QwenTTS) Advanced

ComfyUI Node: Voice Clone (QwenTTS) Advanced

Class Name

AILab_Qwen3TTSVoiceClone_Advanced

Category
🧪AILab/🎙️QwenTTS
Author
1038lab (Account age: 0days)
Extension
ComfyUI-QwenTTS
Latest Updated
2026-03-18
Github Stars
0.2K

How to Install ComfyUI-QwenTTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-QwenTTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-QwenTTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Voice Clone (QwenTTS) Advanced Description

Advanced node for precise voice cloning using QwenTTS, ideal for voice-over and virtual assistants.

Voice Clone (QwenTTS) Advanced:

AILab_Qwen3TTSVoiceClone_Advanced is a sophisticated node designed to facilitate advanced voice cloning using the QwenTTS framework. This node allows you to generate a synthetic voice that closely mimics a reference audio sample, making it ideal for applications where voice replication is crucial, such as in voice-over work, personalized virtual assistants, or any creative project requiring a specific vocal identity. The advanced capabilities of this node include fine-tuning the voice synthesis process by adjusting various parameters, ensuring high-quality and realistic voice outputs. By leveraging state-of-the-art text-to-speech technology, this node provides a seamless and efficient way to clone voices with precision and flexibility, catering to both simple and complex voice cloning needs.

Voice Clone (QwenTTS) Advanced Input Parameters:

reference_audio

This parameter accepts an audio file that serves as the reference for the voice cloning process. The quality and characteristics of this audio will significantly influence the output voice, as the node attempts to replicate the nuances and tone of the reference. There are no strict minimum or maximum values, but a clear and high-quality audio sample is recommended for optimal results.

target_text

The text that you want the cloned voice to speak. This parameter is crucial as it defines the content of the synthesized speech. There are no specific length restrictions, but longer texts may require more processing time.

model_size

This parameter determines the size of the model used for voice synthesis. Larger models may provide more accurate and nuanced voice replication but will require more computational resources. Options typically include small, medium, and large, with the default being medium.

device

Specifies the hardware device to be used for processing, such as "cpu" or "gpu". Using a GPU can significantly speed up the processing time, especially for larger models.

precision

Defines the numerical precision used during processing, with options like "fp32" or "bf16". Higher precision can lead to more accurate results but may require more computational power.

language

Indicates the language of the target text, ensuring that the synthesized voice uses appropriate phonetics and intonation. This is crucial for accurate voice replication in multilingual contexts.

reference_text

An optional parameter that provides additional context or guidance for the voice cloning process. This can help refine the output by aligning it more closely with the intended style or tone.

x_vector_only

A boolean parameter that, when set to true, limits the cloning process to using only x-vectors, which are compact representations of the speaker's voice characteristics. This can be useful for specific technical applications.

voice

Allows you to specify a pre-existing voice model to be used as a base for cloning. This can be useful for building upon previously developed voice profiles.

unload_models

A boolean parameter that determines whether models should be unloaded from memory after processing. Setting this to true can help manage memory usage, especially in environments with limited resources.

seed

An integer used to initialize the random number generator, ensuring reproducibility of results. A default value of -1 indicates that no specific seed is set.

max_new_tokens

Specifies the maximum number of tokens to be generated in the output. This can help control the length of the synthesized speech.

do_sample

A boolean parameter that, when true, enables sampling during the generation process, allowing for more varied and creative outputs.

top_p

A float value that sets the cumulative probability threshold for nucleus sampling, influencing the diversity of the generated speech. The default is 0.9.

top_k

An integer that limits the number of highest probability vocabulary tokens considered during sampling, affecting the randomness of the output. The default is 50.

temperature

A float value that controls the randomness of predictions by scaling the logits before applying softmax. Higher values result in more random outputs. The default is 0.9.

repetition_penalty

A float value that penalizes repeated phrases in the output, helping to maintain naturalness in the synthesized speech. The default is 1.0.

attention

Specifies the attention mechanism to be used, with "auto" being the default. This can affect the focus and coherence of the generated speech.

Voice Clone (QwenTTS) Advanced Output Parameters:

audio

The primary output of this node is the synthesized audio file, which contains the voice-cloned speech based on the provided target text and reference audio. This output is crucial for evaluating the success of the voice cloning process, as it reflects the node's ability to replicate the desired vocal characteristics.

Voice Clone (QwenTTS) Advanced Usage Tips:

  • Ensure that the reference audio is of high quality and free from background noise to achieve the best cloning results.
  • Experiment with different model sizes and precision settings to balance between performance and resource usage.
  • Use the seed parameter to reproduce specific outputs, which can be useful for iterative testing and refinement.
  • Adjust the temperature and top_p parameters to fine-tune the creativity and variability of the synthesized speech.

Voice Clone (QwenTTS) Advanced Common Errors and Solutions:

"Invalid reference audio format"

  • Explanation: The provided reference audio is not in a supported format.
  • Solution: Convert the audio file to a compatible format, such as WAV or MP3, and try again.

"Model size not supported"

  • Explanation: The specified model size is not available or recognized.
  • Solution: Check the available model sizes and ensure you are using a valid option, such as small, medium, or large.

"Insufficient memory for model loading"

  • Explanation: The selected model size requires more memory than is available on the device.
  • Solution: Try using a smaller model size or switch to a device with more memory, such as a GPU.

"Language not supported"

  • Explanation: The specified language is not supported by the current model.
  • Solution: Verify the list of supported languages and select an appropriate one for your target text.

Voice Clone (QwenTTS) Advanced Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-QwenTTS
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Voice Clone (QwenTTS) Advanced