RunComfy

FLUX.2 [klein] 4B & 9B | Ultra-Fast Flux Image Generator

Blazing-fast visual creation with unified editing control.

Flux Kontext Pulid | Consistent Character Generation

Create consistent characters using FLUX Kontext with a single face reference image.

Flux Krea Dev | Natural Text to Image

The best open-source FLUX model! Absolutely incredible natural results.

Z-Image Finetuned Models Collection | Multi-Style Generator

Create stunning, detailed images across multiple styles and moods easily.

ComfyUI > Nodes > ComfyUI_FL-CosyVoice3 > FL CosyVoice3 Instruct2

ComfyUI Node: FL CosyVoice3 Instruct2

Class Name

FL_CosyVoice3_Instruct2

Category
🔊FL CosyVoice3/Synthesis

Author
filliptm (Account age: 2386days) Extension
ComfyUI_FL-CosyVoice3 Latest Updated
2026-03-21 Github Stars
0.11K

Github Ask filliptm Current Questions Past Questions

Table of Content

Description
FL_CosyVoice3_Instruct2:
FL_CosyVoice3_Instruct2 Input Parameters:
FL_CosyVoice3_Instruct2 Output Parameters:
FL_CosyVoice3_Instruct2 Usage Tips:
FL_CosyVoice3_Instruct2 Common Errors and Solutions:
Related Nodes

How to Install ComfyUI_FL-CosyVoice3

Install this extension via the ComfyUI Manager by searching for ComfyUI_FL-CosyVoice3

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_FL-CosyVoice3 in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

FL CosyVoice3 Instruct2 Description

FL_CosyVoice3_Instruct2 enables zero-shot voice cloning with customizable style and tone.

FL CosyVoice3 Instruct2:

FL_CosyVoice3_Instruct2 is a sophisticated node designed for zero-shot voice cloning, allowing you to synthesize speech in a cloned voice while controlling the speaking style and tone through instructive text. This node leverages advanced models like CosyVoice2 and CosyVoice3 to generate speech that mimics a reference voice, providing a seamless and natural-sounding output. The primary goal of this node is to enable users to create personalized and expressive voice outputs by specifying detailed instructions on how the speech should be delivered, such as the desired emotion, tone, and pace. This capability is particularly beneficial for AI artists and developers who wish to create dynamic and engaging audio content without needing extensive technical expertise in voice synthesis.

FL CosyVoice3 Instruct2 Input Parameters:

model

This parameter requires a CosyVoice model, which is essential for the node's operation. The model is responsible for processing the input data and generating the synthesized voice output. It must be loaded from the Model Loader and should be compatible with CosyVoice2 or CosyVoice3 to ensure the availability of the inference_instruct2 function.

text

This is the text that you want to synthesize in the cloned voice. It serves as the primary content for the voice synthesis process. The default value is "Hello, this is my cloned voice speaking." and it supports multiline input, allowing for more complex and lengthy speech synthesis.

instruct_text

This parameter allows you to provide specific instructions to control the speaking style, emotion, and tone of the synthesized voice. Examples include "Speak slowly and gently" or "Use an excited and energetic tone." The default value is "Speak in a warm and friendly tone." and it supports multiline input for detailed instructions.

reference_audio

This is the audio file that serves as the reference voice to be cloned. The audio should be between 3 to 10 seconds long, with a maximum duration of 30 seconds. It is crucial for the voice cloning process as it provides the model with the necessary vocal characteristics to mimic.

speed

This parameter controls the speed of the synthesized speech. It is a float value with a default of 1.0, a minimum of 0.5, and a maximum of 2.0. Adjusting this value allows you to speed up or slow down the speech, providing flexibility in how the final output is delivered.

FL CosyVoice3 Instruct2 Output Parameters:

all_speech

The output parameter all_speech contains the synthesized audio data generated by the node. This output is the culmination of the voice cloning process, incorporating the specified text, reference audio, and instructive text to produce a coherent and expressive speech output. The importance of this parameter lies in its ability to deliver a high-quality audio file that meets the user's specifications for tone, style, and pace.

FL CosyVoice3 Instruct2 Usage Tips:

Ensure that the reference audio is clear and of good quality to achieve the best voice cloning results. A duration of 3 to 10 seconds is recommended for optimal performance.
Experiment with different instructive texts to explore various speaking styles and tones. This can help you achieve the desired emotional impact and engagement in your audio content.
Adjust the speed parameter to match the context of your project. For example, a slower speed might be suitable for a calm and professional tone, while a faster speed could enhance an energetic and lively delivery.

FL CosyVoice3 Instruct2 Common Errors and Solutions:

"inference_instruct2 is not available on this model."

Explanation: This error occurs when the loaded model does not support the inference_instruct2 function, which is necessary for the node's operation.
Solution: Ensure that you are using a compatible CosyVoice2 or CosyVoice3 model. Load the appropriate model through the Model Loader to resolve this issue.

"Reference audio duration exceeds the maximum limit."

Explanation: The reference audio provided exceeds the maximum allowed duration of 30 seconds.
Solution: Trim the reference audio to be within the 3 to 10 seconds range for optimal performance and to avoid this error.

FL CosyVoice3 Instruct2 Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI_FL-CosyVoice3

Table of Content

Description
FL_CosyVoice3_Instruct2:
FL_CosyVoice3_Instruct2 Input Parameters:
FL_CosyVoice3_Instruct2 Output Parameters:
FL_CosyVoice3_Instruct2 Usage Tips:
FL_CosyVoice3_Instruct2 Common Errors and Solutions:
Related Nodes

Outpainting | Expand Image

Easily extend images using outpainting node and ControlNet inpainting model.

LTX-2 First Last Frame | Key Frames Video Generator

Turn still frames into seamless video and sound transitions fast.

Stable Diffusion 1.5 LoRA Inference | AI Toolkit ComfyUI

Run AI Toolkit-trained Stable Diffusion 1.5 LoRAs in ComfyUI with training-matched behavior using a single RCSD15 custom node.

Flux Upscaler - Ultimate 32k | Image Upscaler

Flux Upscaler – Achieve 4k, 8k, 16k, and Ultimate 32k Resolution!

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: FL CosyVoice3 Instruct2

FL_CosyVoice3_Instruct2

How to Install ComfyUI_FL-CosyVoice3

FL CosyVoice3 Instruct2 Description

FL CosyVoice3 Instruct2:

FL CosyVoice3 Instruct2 Input Parameters:

model

text

instruct_text

reference_audio

speed

FL CosyVoice3 Instruct2 Output Parameters:

all_speech

FL CosyVoice3 Instruct2 Usage Tips:

FL CosyVoice3 Instruct2 Common Errors and Solutions:

"inference_instruct2 is not available on this model."

"Reference audio duration exceeds the maximum limit."

FL CosyVoice3 Instruct2 Related Nodes