RunComfy

Z Image Turbo | Ultra-Fast Photorealistic Generator

Generate ultra-clear visuals fast with unmatched real-time detail.

Wan 2.1 | Revolutionary Video Generation

Create incredible videos from text or images with breakthrough AI running on everyday CPUs.

FLUX Inpainting | Seamless Image Editing

Effortlessly fill, remove, and refine images, seamlessly integrating new content.

SCAIL Model | Pose-Guided Animation Maker

Pose-driven animation with identity stability and motion precision.

ComfyUI > Nodes > ComfyUI-QwenTTS > Voice Clone (QwenTTS)

ComfyUI Node: Voice Clone (QwenTTS)

Class Name

AILab_Qwen3TTSVoiceClone

Category
🧪AILab/🎙️QwenTTS

Author
1038lab (Account age: 0days) Extension
ComfyUI-QwenTTS Latest Updated
2026-03-18 Github Stars
0.2K

Github Ask 1038lab Current Questions Past Questions

Table of Content

Description
AILab_Qwen3TTSVoiceClone:
AILab_Qwen3TTSVoiceClone Input Parameters:
AILab_Qwen3TTSVoiceClone Output Parameters:
AILab_Qwen3TTSVoiceClone Usage Tips:
AILab_Qwen3TTSVoiceClone Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-QwenTTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-QwenTTS

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-QwenTTS in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Voice Clone (QwenTTS) Description

Generates synthetic voices mimicking reference audio for personalized voice synthesis applications.

Voice Clone (QwenTTS):

The AILab_Qwen3TTSVoiceClone node is designed to facilitate the creation of synthetic voices that closely mimic a reference audio sample. This node leverages advanced voice cloning technology to generate speech that retains the unique characteristics and nuances of the original speaker's voice. By inputting a reference audio and target text, the node synthesizes a new audio output that sounds as if the original speaker is delivering the new content. This capability is particularly beneficial for applications requiring personalized voice synthesis, such as virtual assistants, audiobooks, and other multimedia content where maintaining a consistent voice identity is crucial. The node's functionality is enhanced by its ability to handle various languages and adjust parameters like temperature and repetition penalty to fine-tune the output's naturalness and variability.

Voice Clone (QwenTTS) Input Parameters:

reference_audio

The reference_audio parameter is used to provide the audio sample that the node will use as a reference for cloning the voice. This audio should be a clear recording of the voice you wish to replicate. The quality and clarity of this audio directly impact the accuracy and quality of the cloned voice. There is no specific minimum or maximum length for the audio, but longer samples may provide better results.

target_text

The target_text parameter specifies the text that you want the cloned voice to speak. This text will be synthesized into speech using the voice characteristics extracted from the reference_audio. There are no restrictions on the length of the text, but longer texts may require more processing time.

model_size

The model_size parameter determines the size of the model used for voice cloning. Larger models may provide more accurate and natural-sounding results but will require more computational resources. Common options might include "small", "medium", and "large", though specific options are not detailed in the context.

device

The device parameter specifies the hardware on which the model will run. Options typically include "cpu" or "gpu", with "auto" allowing the system to choose the best available option. Using a GPU can significantly speed up processing times.

precision

The precision parameter controls the numerical precision used during processing, with options like "bf16" (bfloat16) offering a balance between performance and accuracy. This setting can affect the speed and memory usage of the node.

language

The language parameter indicates the language of the target_text. This ensures that the synthesized speech uses appropriate phonetic and linguistic rules for the specified language. It is important to match this parameter with the language of the text for optimal results.

reference_text

The reference_text parameter is an optional input that provides the text content of the reference_audio. This can help improve the accuracy of the voice cloning process, especially if the reference audio is not entirely clear.

x_vector_only

The x_vector_only parameter is a boolean flag that, when set to true, limits the processing to extracting the x-vector from the reference audio. This is useful for scenarios where only the voice characteristics are needed without generating new speech.

voice

The voice parameter allows you to specify a pre-existing voice model to use as a base for cloning. This can be useful if you have a specific voice model that you want to adapt or modify.

unload_models

The unload_models parameter is a boolean flag that, when set to true, unloads the models from memory after processing. This can help manage memory usage, especially when working with large models or limited resources.

seed

The seed parameter is used to set the random seed for the generation process, ensuring reproducibility of results. A value of -1 indicates that no specific seed is set, allowing for variability in the output.

max_new_tokens

The max_new_tokens parameter defines the maximum number of tokens (or words) that can be generated in the output. This limits the length of the synthesized speech and can be adjusted based on the desired output length.

do_sample

The do_sample parameter is a boolean flag that, when set to true, enables sampling during generation, allowing for more varied and creative outputs. When false, the output is more deterministic.

top_p

The top_p parameter is used in nucleus sampling to control the diversity of the output. It specifies the cumulative probability threshold for token selection, with lower values leading to more conservative outputs.

top_k

The top_k parameter limits the number of tokens considered at each step during generation. A lower value results in more focused outputs, while a higher value allows for more diversity.

temperature

The temperature parameter controls the randomness of the output. Higher values result in more varied and creative outputs, while lower values produce more deterministic results.

repetition_penalty

The repetition_penalty parameter discourages the model from repeating the same phrases or words, enhancing the naturalness of the output. A value of 1.0 means no penalty, while higher values increase the penalty.

attention

The attention parameter specifies the attention mechanism used during processing. The "auto" setting allows the system to choose the best option based on the available resources and model configuration.

Voice Clone (QwenTTS) Output Parameters:

audio

The audio output parameter provides the synthesized speech audio that mimics the voice characteristics of the reference_audio while delivering the content of the target_text. This output is crucial for applications requiring personalized and consistent voice synthesis, as it allows you to generate new speech content that sounds as if it were spoken by the original speaker.

Voice Clone (QwenTTS) Usage Tips:

Ensure that the reference_audio is of high quality and free from background noise to achieve the best voice cloning results.
Experiment with the temperature and top_p parameters to find the right balance between creativity and naturalness in the synthesized speech.
Use the language parameter to match the language of the target_text for accurate phonetic rendering.
Consider using a GPU by setting the device parameter to "gpu" for faster processing times, especially with larger models.

Voice Clone (QwenTTS) Common Errors and Solutions:

"Invalid reference audio format"

Explanation: The provided reference_audio is not in a supported format or is corrupted.
Solution: Ensure that the audio file is in a compatible format such as WAV or MP3 and is not corrupted. Re-record the audio if necessary.

"Model size not supported"

Explanation: The specified model_size is not available or supported by the system.
Solution: Check the available model sizes and select one that is supported. Common options might include "small", "medium", or "large".

"Language not recognized"

Explanation: The language parameter is set to a language that is not supported by the model.
Solution: Verify the list of supported languages and ensure that the language parameter matches one of them.

"Out of memory error"

Explanation: The system ran out of memory while processing the request, possibly due to large model size or insufficient resources.
Solution: Try reducing the model_size, or ensure that the device is set to "gpu" if available. Additionally, consider closing other applications to free up memory.

Voice Clone (QwenTTS) Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-QwenTTS

Table of Content

Description
AILab_Qwen3TTSVoiceClone:
AILab_Qwen3TTSVoiceClone Input Parameters:
AILab_Qwen3TTSVoiceClone Output Parameters:
AILab_Qwen3TTSVoiceClone Usage Tips:
AILab_Qwen3TTSVoiceClone Common Errors and Solutions:
Related Nodes

Z-Image | Fast Photorealistic Base Model

Super-fast image maker with stunning clarity and total control.

One to All Animation | Pose-Based Video Maker

Make smooth pose-following videos with stunning motion consistency.

Z-Image Turbo I2I for Characters | Ultimate Photorealism

Turns portraits into lifelike, perfectly detailed realistic faces fast.

LongCat Avatar in ComfyUI | Identity-Consistent Avatar Animation

Turns one image into smooth, identity-consistent avatar animation.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: Voice Clone (QwenTTS)

AILab_Qwen3TTSVoiceClone

How to Install ComfyUI-QwenTTS

Voice Clone (QwenTTS) Description

Voice Clone (QwenTTS):

Voice Clone (QwenTTS) Input Parameters:

reference_audio

target_text

model_size

device

precision

language

reference_text

x_vector_only

voice

unload_models

seed

max_new_tokens

do_sample

top_p

top_k

temperature

repetition_penalty

attention

Voice Clone (QwenTTS) Output Parameters:

audio

Voice Clone (QwenTTS) Usage Tips:

Voice Clone (QwenTTS) Common Errors and Solutions:

"Invalid reference audio format"

"Model size not supported"

"Language not recognized"

"Out of memory error"

Voice Clone (QwenTTS) Related Nodes