ComfyUI > Nodes > ComfyUI_FL-CosyVoice3 > FL CosyVoice3 Instruct2

ComfyUI Node: FL CosyVoice3 Instruct2

Class Name

FL_CosyVoice3_Instruct2

Category
🔊FL CosyVoice3/Synthesis
Author
filliptm (Account age: 2386days)
Extension
ComfyUI_FL-CosyVoice3
Latest Updated
2026-03-21
Github Stars
0.11K

How to Install ComfyUI_FL-CosyVoice3

Install this extension via the ComfyUI Manager by searching for ComfyUI_FL-CosyVoice3
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_FL-CosyVoice3 in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

FL CosyVoice3 Instruct2 Description

FL_CosyVoice3_Instruct2 enables zero-shot voice cloning with customizable style and tone.

FL CosyVoice3 Instruct2:

FL_CosyVoice3_Instruct2 is a sophisticated node designed for zero-shot voice cloning, allowing you to synthesize speech in a cloned voice while controlling the speaking style and tone through instructive text. This node leverages advanced models like CosyVoice2 and CosyVoice3 to generate speech that mimics a reference voice, providing a seamless and natural-sounding output. The primary goal of this node is to enable users to create personalized and expressive voice outputs by specifying detailed instructions on how the speech should be delivered, such as the desired emotion, tone, and pace. This capability is particularly beneficial for AI artists and developers who wish to create dynamic and engaging audio content without needing extensive technical expertise in voice synthesis.

FL CosyVoice3 Instruct2 Input Parameters:

model

This parameter requires a CosyVoice model, which is essential for the node's operation. The model is responsible for processing the input data and generating the synthesized voice output. It must be loaded from the Model Loader and should be compatible with CosyVoice2 or CosyVoice3 to ensure the availability of the inference_instruct2 function.

text

This is the text that you want to synthesize in the cloned voice. It serves as the primary content for the voice synthesis process. The default value is "Hello, this is my cloned voice speaking." and it supports multiline input, allowing for more complex and lengthy speech synthesis.

instruct_text

This parameter allows you to provide specific instructions to control the speaking style, emotion, and tone of the synthesized voice. Examples include "Speak slowly and gently" or "Use an excited and energetic tone." The default value is "Speak in a warm and friendly tone." and it supports multiline input for detailed instructions.

reference_audio

This is the audio file that serves as the reference voice to be cloned. The audio should be between 3 to 10 seconds long, with a maximum duration of 30 seconds. It is crucial for the voice cloning process as it provides the model with the necessary vocal characteristics to mimic.

speed

This parameter controls the speed of the synthesized speech. It is a float value with a default of 1.0, a minimum of 0.5, and a maximum of 2.0. Adjusting this value allows you to speed up or slow down the speech, providing flexibility in how the final output is delivered.

FL CosyVoice3 Instruct2 Output Parameters:

all_speech

The output parameter all_speech contains the synthesized audio data generated by the node. This output is the culmination of the voice cloning process, incorporating the specified text, reference audio, and instructive text to produce a coherent and expressive speech output. The importance of this parameter lies in its ability to deliver a high-quality audio file that meets the user's specifications for tone, style, and pace.

FL CosyVoice3 Instruct2 Usage Tips:

  • Ensure that the reference audio is clear and of good quality to achieve the best voice cloning results. A duration of 3 to 10 seconds is recommended for optimal performance.
  • Experiment with different instructive texts to explore various speaking styles and tones. This can help you achieve the desired emotional impact and engagement in your audio content.
  • Adjust the speed parameter to match the context of your project. For example, a slower speed might be suitable for a calm and professional tone, while a faster speed could enhance an energetic and lively delivery.

FL CosyVoice3 Instruct2 Common Errors and Solutions:

"inference_instruct2 is not available on this model."

  • Explanation: This error occurs when the loaded model does not support the inference_instruct2 function, which is necessary for the node's operation.
  • Solution: Ensure that you are using a compatible CosyVoice2 or CosyVoice3 model. Load the appropriate model through the Model Loader to resolve this issue.

"Reference audio duration exceeds the maximum limit."

  • Explanation: The reference audio provided exceeds the maximum allowed duration of 30 seconds.
  • Solution: Trim the reference audio to be within the 3 to 10 seconds range for optimal performance and to avoid this error.

FL CosyVoice3 Instruct2 Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_FL-CosyVoice3
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

FL CosyVoice3 Instruct2