RunComfy

Wan 2.2 FLF2V | First-Last Frame Video Generation

Generate smooth videos from a start and end frame using Wan 2.2 FLF2V.

Flux 2 Dev | Photoreal Text-to-Image Generator

Next-level image realism with advanced generation control power

SeedVR2 | Image & Video Upscaler

Fixes blur instantly. Better than Keep/PMRF.

FLUX LoRA (RealismLoRA) | Photorealistic Images

Blend FLUX-1 model with FLUX-RealismLoRA for photorealistic AI images

ComfyUI > Nodes > ComfyUI-Qwen3-TTS > Qwen3-TTS Voice Clone

ComfyUI Node: Qwen3-TTS Voice Clone

Class Name

Qwen3VoiceClone

Category
Qwen3-TTS

Author
wanaigc (Account age: 0days) Extension
ComfyUI-Qwen3-TTS Latest Updated
2026-03-21 Github Stars
0.09K

Github Ask wanaigc Current Questions Past Questions

Table of Content

Description
Qwen3VoiceClone:
Qwen3VoiceClone Input Parameters:
Qwen3VoiceClone Output Parameters:
Qwen3VoiceClone Usage Tips:
Qwen3VoiceClone Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-Qwen3-TTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-Qwen3-TTS

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-Qwen3-TTS in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Qwen3-TTS Voice Clone Description

Qwen3VoiceClone enables realistic voice cloning using Qwen3-TTS by analyzing audio and text.

Qwen3-TTS Voice Clone:

Qwen3VoiceClone is a specialized node designed to facilitate the creation of voice clones using the Qwen3-TTS framework. This node allows you to replicate a specific voice by analyzing reference audio and text, or by using a prompt. The primary goal of Qwen3VoiceClone is to enable users to generate synthetic voices that closely mimic the characteristics of a target voice, making it an invaluable tool for applications in voice synthesis, entertainment, and personalized audio content creation. By leveraging advanced machine learning models, this node provides a seamless and efficient way to produce high-quality voice clones, ensuring that the output is both realistic and expressive.

Qwen3-TTS Voice Clone Input Parameters:

prompt

The prompt parameter is used to provide a textual input that guides the voice cloning process. It serves as a script or dialogue that the cloned voice will articulate. This parameter is crucial when you want to generate a voice clone based solely on text without reference audio. The prompt should be clear and concise to ensure accurate voice synthesis. There are no specific minimum or maximum values, but the content should be relevant to the intended voice output.

ref_audio

The ref_audio parameter is an audio file that serves as a reference for the voice cloning process. It captures the unique characteristics of the target voice, such as tone, pitch, and style. This parameter is essential when you want to create a voice clone that closely resembles a specific voice. The quality and clarity of the reference audio significantly impact the accuracy of the voice clone. There are no explicit constraints on the audio file format, but it should be compatible with the node's processing capabilities.

ref_text

The ref_text parameter accompanies the ref_audio and provides the textual content of the reference audio. It helps the node align the audio with the corresponding text, ensuring that the voice clone accurately reflects the intended speech. This parameter is necessary when using reference audio to guide the cloning process. The text should match the spoken content in the reference audio for optimal results.

Qwen3-TTS Voice Clone Output Parameters:

cloned_voice

The cloned_voice output parameter represents the synthesized voice that mimics the characteristics of the target voice. This output is the result of the voice cloning process and is delivered as an audio file. The cloned voice is expected to be a high-quality representation of the input parameters, capturing the nuances and style of the reference voice or prompt. This output is crucial for applications requiring realistic and expressive voice synthesis.

Qwen3-TTS Voice Clone Usage Tips:

Ensure that the reference audio is of high quality and free from background noise to achieve the best voice cloning results.
When using a prompt, make sure the text is clear and well-structured to facilitate accurate voice synthesis.
Experiment with different combinations of reference audio and text to fine-tune the voice clone to your specific needs.

Qwen3-TTS Voice Clone Common Errors and Solutions:

Model Type Error: You are trying to use 'Voice Clone' with an incompatible model. Please load a 'Base' model (e.g. Qwen3-TTS-12Hz-1.7B-Base).

Explanation: This error occurs when the loaded model does not support the voice cloning feature.
Solution: Ensure that you are using a compatible 'Base' model that supports voice cloning, such as Qwen3-TTS-12Hz-1.7B-Base.

For Voice Clone, you must provide either 'prompt' OR ('ref_audio' AND 'ref_text').

Explanation: This error indicates that the necessary input parameters for voice cloning are not provided.
Solution: Provide either a prompt or both ref_audio and ref_text to proceed with the voice cloning process.

Qwen3-TTS Voice Clone Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-Qwen3-TTS

Table of Content

Description
Qwen3VoiceClone:
Qwen3VoiceClone Input Parameters:
Qwen3VoiceClone Output Parameters:
Qwen3VoiceClone Usage Tips:
Qwen3VoiceClone Common Errors and Solutions:
Related Nodes

Hunyuan Video | Image-Prompt to Video

Convert an image and a text prompt into a dynamic video.

FLUX Kontext LoRA | Style Transfer

Mix 13 art styles instantly or plug in custom LoRAs!

Fantasy Portrait | Expressive Photo Animation

Photo → expressive cinematic face animation, fast and identity-accurate.

Advanced Live Portrait | Parameter Control

Use customizable parameters to control every feature, from eye blinks to head movements, for natural results.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy