Wan2.2 VACE Fun | Image to Animated Video

Turn still photos into lifelike animated videos with custom prompts.

Z Image Turbo | Ultra-Fast Photorealistic Generator

Generate ultra-clear visuals fast with unmatched real-time detail.

FLUX.1 Dev LoRA Inference | AI Toolkit ComfyUI

Run your AI Toolkit-trained FLUX.1 Dev LoRA in ComfyUI with training-matched behavior using a single RCFluxDev custom node.

Wan 2.1 FLF2V | First-Last Frame Video

Generate smooth videos from a start and end frame using Wan 2.1 FLF2V.

ComfyUI > Nodes > ComfyUI_FL-CosyVoice3 > FL CosyVoice3 Dialog

ComfyUI Node: FL CosyVoice3 Dialog

Class Name

FL_CosyVoice3_Dialog

Category
🔊FL CosyVoice3/Synthesis

Author
filliptm (Account age: 2386days) Extension
ComfyUI_FL-CosyVoice3 Latest Updated
2026-03-21 Github Stars
0.11K

Github Ask filliptm Current Questions Past Questions

Table of Content

Description
FL_CosyVoice3_Dialog:
FL_CosyVoice3_Dialog Input Parameters:
FL_CosyVoice3_Dialog Output Parameters:
FL_CosyVoice3_Dialog Usage Tips:
FL_CosyVoice3_Dialog Common Errors and Solutions:
Related Nodes

How to Install ComfyUI_FL-CosyVoice3

Install this extension via the ComfyUI Manager by searching for ComfyUI_FL-CosyVoice3

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_FL-CosyVoice3 in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

FL CosyVoice3 Dialog Description

Facilitates multi-speaker dialog synthesis with voice cloning for realistic conversations.

FL CosyVoice3 Dialog:

The FL_CosyVoice3_Dialog node is designed to facilitate multi-speaker dialog synthesis with voice cloning capabilities. This node allows you to create realistic and dynamic conversations between multiple speakers by leveraging the CosyVoice model. It is particularly beneficial for AI artists and developers who want to generate natural-sounding dialog sequences with distinct speaker voices. The node supports up to four speakers, each with their own voice reference, and can handle dialog text with speaker labels to ensure accurate voice assignment. By using this node, you can achieve high-quality text-to-speech synthesis that mimics the nuances of human conversation, making it an essential tool for projects requiring sophisticated audio outputs.

FL CosyVoice3 Dialog Input Parameters:

model

The model parameter requires a CosyVoice model instance, which is essential for the dialog synthesis process. This model is responsible for generating the audio output based on the provided dialog text and speaker audio references. It is crucial to ensure that the model is compatible with the node to achieve optimal results.

dialog_text

The dialog_text parameter is a string input that contains the dialog script with speaker labels, such as "SPEAKER A: Hello, how are you?". This text serves as the blueprint for the dialog synthesis, guiding the node in assigning the correct voice to each line of dialog. The default value is a sample dialog, and it supports multiline input for more complex conversations.

speaker_A_Audio

The speaker_A_Audio parameter is an audio input that provides a voice reference for Speaker A. This audio clip should be a maximum of 30 seconds long and is used to clone the voice characteristics for Speaker A's dialog lines. It is essential for ensuring that the synthesized voice matches the intended speaker's voice.

speaker_B_Audio

Similar to speaker_A_Audio, the speaker_B_Audio parameter provides a voice reference for Speaker B. This audio input is also limited to 30 seconds and is crucial for accurately cloning Speaker B's voice in the dialog synthesis process.

speed

The speed parameter is a float value that adjusts the speech speed multiplier for the synthesized dialog. It allows you to control the tempo of the conversation, with a default value of 1.0. The minimum value is 0.5, and the maximum is 2.0, with a step increment of 0.05. Adjusting this parameter can help match the desired pacing of the dialog.

speaker_C_Audio

The speaker_C_Audio parameter is an optional audio input for providing a voice reference for Speaker C. Like the other speaker audio inputs, it should not exceed 30 seconds and is used to clone Speaker C's voice if included in the dialog.

speaker_D_Audio

The speaker_D_Audio parameter is another optional audio input for Speaker D's voice reference. It follows the same guidelines as the other speaker audio inputs and is used to clone Speaker D's voice in the dialog synthesis.

seed

The seed parameter is an integer that sets the random seed for the synthesis process, ensuring reproducibility of results. The default value is 42, with a range from -1 to 2147483647. Setting the seed to -1 allows for random generation, which can be useful for experimenting with different synthesis outcomes.

FL CosyVoice3 Dialog Output Parameters:

speaker_waveforms

The speaker_waveforms output parameter is a dictionary containing the generated audio waveforms for each speaker. This output is crucial as it provides the synthesized audio for each speaker's dialog lines, allowing you to listen to and utilize the generated conversation in your projects.

combined_waveforms

The combined_waveforms output parameter is a list of audio waveforms that represent the entire dialog sequence combined. This output is important for obtaining a single audio file that encapsulates the entire conversation, making it easier to manage and use in various applications.

FL CosyVoice3 Dialog Usage Tips:

Ensure that the audio references for each speaker are clear and of high quality to achieve the best voice cloning results.
Use the speed parameter to adjust the pacing of the dialog to match the desired conversational flow.
Experiment with different seed values to explore variations in the synthesized audio output.

FL CosyVoice3 Dialog Common Errors and Solutions:

No valid dialog lines found. Use format: SPEAKER A: text

Explanation: This error occurs when the dialog text does not contain any valid lines with recognized speaker labels or when no audio references are provided for the speakers.
Solution: Ensure that the dialog text is formatted correctly with speaker labels and that audio references are provided for each speaker involved in the dialog.

Skipping line for Speaker X: no audio reference provided

Explanation: This message indicates that a line of dialog was skipped because there was no audio reference available for the specified speaker.
Solution: Provide an audio reference for the speaker mentioned in the dialog line to ensure that their voice can be synthesized.

FL CosyVoice3 Dialog Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI_FL-CosyVoice3

Table of Content

Description
FL_CosyVoice3_Dialog:
FL_CosyVoice3_Dialog Input Parameters:
FL_CosyVoice3_Dialog Output Parameters:
FL_CosyVoice3_Dialog Usage Tips:
FL_CosyVoice3_Dialog Common Errors and Solutions:
Related Nodes

Z Image ControlNet | Precision Image Generator

Total control over image poses, edges, and depth layouts.

Instagirl v.20 | Wan 2.2 LoRA Demo

A Wan 2.2 workflow for demoing the Instagirl LoRA by Instara.

ComfyUI Grounding | Object Tracking Workflow

Track any subject with pixel-perfect accuracy for stunning VFX results.

Animatediff V2 & V3 | Text to Video

Explore AnimateDiff V3, AnimateDiff SDXL and AnimateDiff V2, and use Upscale for high-resolution results.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.