ComfyUI > Nodes > ComfyUI-FL-Qwen3TTS > FL Qwen3 TTS Voice Clone

ComfyUI Node: FL Qwen3 TTS Voice Clone

Class Name

FL_Qwen3TTS_VoiceClone

Category
FL/Qwen3TTS
Author
filliptm (Account age: 2372days)
Extension
ComfyUI-FL-Qwen3TTS
Latest Updated
2026-03-18
Github Stars
0.12K

How to Install ComfyUI-FL-Qwen3TTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-FL-Qwen3TTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-FL-Qwen3TTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

FL Qwen3 TTS Voice Clone Description

Facilitates voice cloning with Qwen3-TTS, generating realistic speech from reference audio.

FL Qwen3 TTS Voice Clone:

The FL_Qwen3TTS_VoiceClone node is designed to facilitate voice cloning using the Qwen3-TTS model. This node allows you to generate speech that mimics the voice characteristics of a reference audio sample. By leveraging advanced text-to-speech technology, it enables the creation of personalized and realistic voice outputs. The primary goal of this node is to provide a seamless and efficient way to clone voices, making it an invaluable tool for AI artists who wish to incorporate unique vocal elements into their projects. The node supports various configurations to fine-tune the voice cloning process, ensuring high-quality and customizable results.

FL Qwen3 TTS Voice Clone Input Parameters:

model

This parameter requires a Qwen3-TTS model, specifically the Qwen3-TTS-12Hz-1.7B-Base, which is essential for performing voice cloning. The model acts as the backbone of the voice cloning process, interpreting the input text and reference audio to generate the desired voice output.

text

The text parameter is a string input that represents the content you want to be spoken in the cloned voice. It supports multiline input and defaults to "Hello, this is a test of voice cloning." This text will be synthesized into speech using the voice characteristics derived from the reference audio.

ref_audio

This parameter accepts an audio input that serves as the reference for cloning the voice. The reference audio is crucial as it provides the vocal characteristics that the model will mimic in the generated speech. Without this, the voice cloning process cannot proceed.

language

The language parameter specifies the language in which the text will be synthesized. It defaults to "English" and allows you to choose from a predefined set of languages, ensuring that the voice cloning process respects linguistic nuances.

x_vector_only_mode

This boolean parameter, defaulting to False, determines whether the model should use only the x-vector for voice cloning. When enabled, it focuses on extracting and utilizing the speaker's identity from the reference audio, potentially affecting the naturalness and variability of the output.

top_k

An integer parameter that controls the number of highest probability vocabulary tokens to keep for sampling. It ranges from 1 to 200, with a default value of 50. Adjusting this parameter can influence the diversity and creativity of the generated speech.

top_p

This float parameter, ranging from 0.1 to 1.0 with a default of 1.0, is used for nucleus sampling. It determines the cumulative probability threshold for token selection, impacting the randomness and variability of the output.

temperature

The temperature parameter, a float ranging from 0.1 to 2.0 with a default of 0.9, affects the randomness of the sampling process. Lower values make the output more deterministic, while higher values increase variability and creativity.

repetition_penalty

A float parameter that ranges from 1.0 to 2.0, with a default of 1.05. It penalizes the model for repeating the same phrases, encouraging more varied and natural speech generation.

max_new_tokens

This integer parameter specifies the maximum number of new tokens to generate, ranging from 128 to 8192, with a default of 2048. It controls the length of the generated speech, allowing you to tailor the output to your needs.

seed

The seed parameter is an integer used to set the random seed for reproducibility. It ranges from -1 to a large positive integer, with a default of -1, which means a random seed will be generated. Setting a specific seed ensures consistent results across runs.

ref_text

An optional string parameter that provides additional context or reference text to guide the voice cloning process. It supports multiline input and defaults to an empty string.

voice_clone_prompt

This optional parameter accepts a pre-computed voice clone prompt, which can be used to streamline the voice cloning process by providing predefined settings or configurations.

FL Qwen3 TTS Voice Clone Output Parameters:

audio

The audio output parameter represents the generated speech in the cloned voice. It is the final product of the voice cloning process, encapsulating the input text spoken in the voice characteristics derived from the reference audio. This output is crucial for applications requiring personalized or unique vocal outputs.

FL Qwen3 TTS Voice Clone Usage Tips:

  • Ensure that the reference audio is clear and of high quality to achieve the best voice cloning results.
  • Experiment with the temperature, top_k, and top_p parameters to find the right balance between creativity and naturalness in the generated speech.
  • Use the seed parameter to reproduce specific results, which is useful for iterative testing and refinement.
  • If you encounter issues with voice variability, consider adjusting the x_vector_only_mode to see if it improves the output.

FL Qwen3 TTS Voice Clone Common Errors and Solutions:

No model provided. Please connect a Model Loader node.

  • Explanation: This error occurs when the required Qwen3-TTS model is not connected to the node.
  • Solution: Ensure that a compatible model, specifically Qwen3-TTS-12Hz-1.7B-Base, is loaded and connected to the node.

Wrong model type for Voice Clone node!

  • Explanation: The model connected is not compatible with the voice cloning process.
  • Solution: Verify that the model type is Qwen3-TTS-12Hz-1.7B-Base and switch to this model if necessary.

Reference audio is required for voice cloning.

  • Explanation: The node requires reference audio to perform voice cloning, and none was provided.
  • Solution: Provide a valid reference audio file to enable the voice cloning process.

Voice cloning failed: <error_message>

  • Explanation: An unspecified error occurred during the voice cloning process.
  • Solution: Check the error message for details, ensure all inputs are correctly configured, and consult the logs for more information.

FL Qwen3 TTS Voice Clone Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-FL-Qwen3TTS
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

FL Qwen3 TTS Voice Clone