FL Qwen3 TTS Voice Clone:
The FL_Qwen3TTS_VoiceClone node is designed to facilitate voice cloning using the Qwen3-TTS model. This node allows you to generate speech that mimics the voice characteristics of a reference audio sample. By leveraging advanced text-to-speech technology, it enables the creation of personalized and realistic voice outputs. The primary goal of this node is to provide a seamless and efficient way to clone voices, making it an invaluable tool for AI artists who wish to incorporate unique vocal elements into their projects. The node supports various configurations to fine-tune the voice cloning process, ensuring high-quality and customizable results.
FL Qwen3 TTS Voice Clone Input Parameters:
model
This parameter requires a Qwen3-TTS model, specifically the Qwen3-TTS-12Hz-1.7B-Base, which is essential for performing voice cloning. The model acts as the backbone of the voice cloning process, interpreting the input text and reference audio to generate the desired voice output.
text
The text parameter is a string input that represents the content you want to be spoken in the cloned voice. It supports multiline input and defaults to "Hello, this is a test of voice cloning." This text will be synthesized into speech using the voice characteristics derived from the reference audio.
ref_audio
This parameter accepts an audio input that serves as the reference for cloning the voice. The reference audio is crucial as it provides the vocal characteristics that the model will mimic in the generated speech. Without this, the voice cloning process cannot proceed.
language
The language parameter specifies the language in which the text will be synthesized. It defaults to "English" and allows you to choose from a predefined set of languages, ensuring that the voice cloning process respects linguistic nuances.
x_vector_only_mode
This boolean parameter, defaulting to False, determines whether the model should use only the x-vector for voice cloning. When enabled, it focuses on extracting and utilizing the speaker's identity from the reference audio, potentially affecting the naturalness and variability of the output.
top_k
An integer parameter that controls the number of highest probability vocabulary tokens to keep for sampling. It ranges from 1 to 200, with a default value of 50. Adjusting this parameter can influence the diversity and creativity of the generated speech.
top_p
This float parameter, ranging from 0.1 to 1.0 with a default of 1.0, is used for nucleus sampling. It determines the cumulative probability threshold for token selection, impacting the randomness and variability of the output.
temperature
The temperature parameter, a float ranging from 0.1 to 2.0 with a default of 0.9, affects the randomness of the sampling process. Lower values make the output more deterministic, while higher values increase variability and creativity.
repetition_penalty
A float parameter that ranges from 1.0 to 2.0, with a default of 1.05. It penalizes the model for repeating the same phrases, encouraging more varied and natural speech generation.
max_new_tokens
This integer parameter specifies the maximum number of new tokens to generate, ranging from 128 to 8192, with a default of 2048. It controls the length of the generated speech, allowing you to tailor the output to your needs.
seed
The seed parameter is an integer used to set the random seed for reproducibility. It ranges from -1 to a large positive integer, with a default of -1, which means a random seed will be generated. Setting a specific seed ensures consistent results across runs.
ref_text
An optional string parameter that provides additional context or reference text to guide the voice cloning process. It supports multiline input and defaults to an empty string.
voice_clone_prompt
This optional parameter accepts a pre-computed voice clone prompt, which can be used to streamline the voice cloning process by providing predefined settings or configurations.
FL Qwen3 TTS Voice Clone Output Parameters:
audio
The audio output parameter represents the generated speech in the cloned voice. It is the final product of the voice cloning process, encapsulating the input text spoken in the voice characteristics derived from the reference audio. This output is crucial for applications requiring personalized or unique vocal outputs.
FL Qwen3 TTS Voice Clone Usage Tips:
- Ensure that the reference audio is clear and of high quality to achieve the best voice cloning results.
- Experiment with the
temperature,top_k, andtop_pparameters to find the right balance between creativity and naturalness in the generated speech. - Use the
seedparameter to reproduce specific results, which is useful for iterative testing and refinement. - If you encounter issues with voice variability, consider adjusting the
x_vector_only_modeto see if it improves the output.
FL Qwen3 TTS Voice Clone Common Errors and Solutions:
No model provided. Please connect a Model Loader node.
- Explanation: This error occurs when the required Qwen3-TTS model is not connected to the node.
- Solution: Ensure that a compatible model, specifically
Qwen3-TTS-12Hz-1.7B-Base, is loaded and connected to the node.
Wrong model type for Voice Clone node!
- Explanation: The model connected is not compatible with the voice cloning process.
- Solution: Verify that the model type is
Qwen3-TTS-12Hz-1.7B-Baseand switch to this model if necessary.
Reference audio is required for voice cloning.
- Explanation: The node requires reference audio to perform voice cloning, and none was provided.
- Solution: Provide a valid reference audio file to enable the voice cloning process.
Voice cloning failed: <error_message>
- Explanation: An unspecified error occurred during the voice cloning process.
- Solution: Check the error message for details, ensure all inputs are correctly configured, and consult the logs for more information.
