ComfyUI > Nodes > ComfyUI-KugelAudio > KugelAudio Multi-Speaker

ComfyUI Node: KugelAudio Multi-Speaker

Class Name

KugelAudioMultiSpeakerNode

Category
KugelAudio
Author
Saganaki22 (Account age: 0days)
Extension
ComfyUI-KugelAudio
Latest Updated
2026-02-28
Github Stars
0.03K

How to Install ComfyUI-KugelAudio

Install this extension via the ComfyUI Manager by searching for ComfyUI-KugelAudio
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-KugelAudio in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

KugelAudio Multi-Speaker Description

Facilitates multi-speaker audio creation with distinct voices for dynamic, engaging narratives.

KugelAudio Multi-Speaker:

The KugelAudioMultiSpeakerNode is a versatile tool designed to facilitate the creation of multi-speaker audio outputs using the KugelAudio framework. This node allows you to input text for up to six different speakers, each potentially with their own unique voice sample, to generate a cohesive audio narrative. The primary benefit of this node is its ability to handle multiple speakers seamlessly, making it ideal for projects that require dialogue or multi-character interactions. By leveraging advanced audio processing techniques, the node ensures that each speaker's voice is distinct and clear, providing a rich auditory experience. The node also supports various configurations, such as adjusting the sampling temperature for varied outputs and adding pauses between speakers for natural pacing. This makes it a powerful tool for AI artists looking to create dynamic and engaging audio content.

KugelAudio Multi-Speaker Input Parameters:

speaker1_voice

This parameter allows you to provide a voice sample for Speaker 1. The sample can be in any sample rate and can be either mono or stereo. This input is optional, but providing a sample can help in generating a more personalized and distinct voice for the speaker.

speaker2_voice

Similar to speaker1_voice, this parameter is for Speaker 2's voice sample. It accepts any sample rate and mono/stereo formats. Providing a sample is optional but recommended for better voice differentiation.

speaker3_voice

This parameter is for Speaker 3's voice sample, accepting any sample rate and mono/stereo formats. It is optional but can enhance the uniqueness of the speaker's voice.

speaker4_voice

For Speaker 4, this parameter allows you to input a voice sample. It supports any sample rate and mono/stereo formats. While optional, it can contribute to a more personalized audio output.

speaker5_voice

This parameter is for Speaker 5's voice sample, accepting any sample rate and mono/stereo formats. Providing a sample is optional but can improve the distinctiveness of the speaker's voice.

speaker6_voice

For Speaker 6, this parameter allows you to input a voice sample. It supports any sample rate and mono/stereo formats. While optional, it can enhance the uniqueness of the speaker's voice.

seed

This integer parameter sets the random seed for reproducible audio generation. It has a default value of 42, with a range from 0 to 2^32 - 1. Adjusting the seed can help in achieving consistent results across different runs.

text

This string parameter is where you input the dialogue for the speakers. It supports multiline text and follows the format Speaker N: text (N=1-6). The default text provides an example of how to structure the input. This parameter is crucial for defining what each speaker will say.

model

This parameter allows you to select the audio model from the available options in ComfyUI/models/kugelaudio/. The model is auto-downloaded on the first run. Choosing the right model can significantly impact the quality and style of the generated audio.

attention_type

This parameter lets you choose the attention implementation, with options like SageAttention and FlashAttention, which require CUDA. The default is auto, which detects the best available option. This setting can affect the performance and quality of the audio processing.

use_4bit

This boolean parameter determines whether to quantize the language model to 4-bit using bitsandbytes, reducing VRAM usage from approximately 19GB to 8GB. It requires a CUDA GPU and is automatically disabled for CPU/MPS devices. This option can be useful for optimizing resource usage.

do_sample

This boolean parameter enables sampling for more varied audio outputs. When disabled, the output is deterministic. This setting can be useful for creating more dynamic and less predictable audio content.

temperature

This float parameter sets the sampling temperature, with a default value of 1.0 and a range from 0.1 to 2.0. It is only used if do_sample is set to True. Adjusting the temperature can influence the creativity and variability of the generated audio.

pause_between_speakers

This float parameter adds a pause between each speaker, with a default value of 0.2 seconds and a range from 0.0 to 2.0 seconds. This setting helps in creating a more natural pacing in the audio output.

disable_watermark

This boolean parameter allows you to disable audio watermarking, which can be useful if you experience stuttering or micro-freezes in the generated audio. Disabling the watermark can improve playback smoothness.

KugelAudio Multi-Speaker Output Parameters:

audio_output

The primary output of the node is the generated audio file, which contains the dialogue of all specified speakers. This output is crucial as it represents the final product of the node's processing, providing a multi-speaker audio narrative based on the input parameters.

KugelAudio Multi-Speaker Usage Tips:

  • To achieve the best results, provide distinct voice samples for each speaker to enhance the uniqueness of their voices.
  • Experiment with different models and attention types to find the best combination for your specific audio project.
  • Use the pause_between_speakers parameter to adjust the pacing of the dialogue, making it sound more natural and engaging.

KugelAudio Multi-Speaker Common Errors and Solutions:

"Model not found"

  • Explanation: This error occurs when the specified model is not available in the ComfyUI/models/kugelaudio/ directory.
  • Solution: Ensure that the model is correctly downloaded and placed in the specified directory. You may need to run the node once to trigger the auto-download feature.

"CUDA device not available"

  • Explanation: This error indicates that the node is attempting to use a CUDA-specific feature on a non-CUDA device.
  • Solution: Check your device compatibility and ensure that CUDA is properly installed and configured. Alternatively, disable CUDA-specific features like use_4bit if running on a CPU/MPS device.

"Invalid voice sample format"

  • Explanation: This error occurs when the provided voice sample is in an unsupported format.
  • Solution: Ensure that the voice samples are in a compatible format, such as any sample rate and mono/stereo audio files.

KugelAudio Multi-Speaker Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-KugelAudio
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

KugelAudio Multi-Speaker