ComfyUI > Nodes > ComfyUI_FL-CosyVoice3 > FL CosyVoice3 Speaker Clone

ComfyUI Node: FL CosyVoice3 Speaker Clone

Class Name

FL_CosyVoice3_SpeakerClone

Category
🔊FL CosyVoice3/Synthesis
Author
filliptm (Account age: 2386days)
Extension
ComfyUI_FL-CosyVoice3
Latest Updated
2026-03-21
Github Stars
0.11K

How to Install ComfyUI_FL-CosyVoice3

Install this extension via the ComfyUI Manager by searching for ComfyUI_FL-CosyVoice3
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_FL-CosyVoice3 in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

FL CosyVoice3 Speaker Clone Description

Synthesizes speech using a preset for zero-shot voice cloning without reference audio.

FL CosyVoice3 Speaker Clone:

The FL_CosyVoice3_SpeakerClone node is designed to synthesize speech using a saved speaker preset, enabling high-quality voice cloning without the need for the original reference audio. This node leverages advanced text-to-speech capabilities to perform zero-shot voice cloning, which means it can generate speech in the voice of a speaker based solely on a preset file, without requiring additional audio samples. This functionality is particularly beneficial for creating personalized voice outputs in various applications, such as virtual assistants, audiobooks, or any scenario where a specific voice is desired. By utilizing a speaker preset file, the node ensures that the synthesized speech closely matches the intended speaker's characteristics, providing a seamless and realistic audio experience.

FL CosyVoice3 Speaker Clone Input Parameters:

model

The model parameter is a dictionary containing the necessary components for the node to function, including the text-to-speech model itself. This parameter is crucial as it provides the node with the required model architecture and configurations to perform speech synthesis. The model should be pre-loaded and compatible with the FL CosyVoice3 framework to ensure optimal performance.

text

The text parameter is a string that represents the input text to be converted into speech. This is the content that the node will synthesize into audio using the specified speaker's voice. The quality and clarity of the synthesized speech are directly influenced by the text input, making it essential to provide clear and well-structured text for the best results.

speaker_preset

The speaker_preset parameter is a string that specifies the name of the speaker preset file (without the .pt extension) to be used for voice cloning. This file contains the voice characteristics of the desired speaker and is essential for the node to generate speech that closely resembles the target voice. The preset must be pre-saved using the FL CosyVoice3 Save Speaker functionality.

speed

The speed parameter is a float that determines the speed at which the synthesized speech is delivered. A value of 1.0 represents the normal speed, while values greater than 1.0 will increase the speed, and values less than 1.0 will decrease it. This parameter allows for customization of the speech tempo to suit different applications or preferences.

seed

The seed parameter is an integer used to initialize the random number generators for reproducibility. By setting a specific seed value, you can ensure that the speech synthesis process produces the same output each time it is run with the same inputs. A value of -1 indicates that no specific seed is set, allowing for variability in the output.

text_frontend

The text_frontend parameter is a boolean that indicates whether to use the text frontend processing. When set to True, the node will apply additional text processing to enhance the quality of the synthesized speech. This can be particularly useful for handling complex text inputs or ensuring better pronunciation and intonation.

FL CosyVoice3 Speaker Clone Output Parameters:

audio

The audio output parameter is a dictionary containing the synthesized speech waveform and its sample rate. This output represents the final audio result of the text-to-speech process, encapsulating the voice characteristics of the specified speaker preset. The waveform can be used directly in applications requiring audio playback, and the sample rate ensures compatibility with various audio systems.

FL CosyVoice3 Speaker Clone Usage Tips:

  • Ensure that the speaker preset file is correctly saved and accessible in the specified directory to avoid errors during the synthesis process.
  • Experiment with different speed values to find the optimal speech tempo for your specific application, keeping in mind that extreme values may affect the naturalness of the speech.
  • Use the seed parameter to achieve consistent results across multiple runs, especially when testing or comparing different configurations.

FL CosyVoice3 Speaker Clone Common Errors and Solutions:

Speaker preset file not found: <file_path>

  • Explanation: This error occurs when the specified speaker preset file cannot be located in the expected directory.
  • Solution: Verify that the preset file exists in the correct directory and that the filename is correctly specified without the .pt extension. Ensure that the FL CosyVoice3 Save Speaker functionality has been used to create the preset.

No audio was generated. Check model and preset.

  • Explanation: This error indicates that the node was unable to produce any audio output, possibly due to issues with the model or the speaker preset.
  • Solution: Double-check that the model is correctly loaded and compatible with the FL CosyVoice3 framework. Ensure that the speaker preset is valid and properly injected into the model's frontend.

Error in speaker clone: <error_message>

  • Explanation: A general error occurred during the speaker cloning process, which could be due to various reasons such as incorrect input parameters or model issues.
  • Solution: Review the error message for specific details and ensure that all input parameters are correctly set. Check the model and preset configurations for compatibility and correctness.

FL CosyVoice3 Speaker Clone Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_FL-CosyVoice3
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

FL CosyVoice3 Speaker Clone