KugelAudio Multi-Speaker:
The KugelAudioMultiSpeakerNode is a versatile tool designed to facilitate the creation of multi-speaker audio outputs using the KugelAudio framework. This node allows you to input text for up to six different speakers, each potentially with their own unique voice sample, to generate a cohesive audio narrative. The primary benefit of this node is its ability to handle multiple speakers seamlessly, making it ideal for projects that require dialogue or multi-character interactions. By leveraging advanced audio processing techniques, the node ensures that each speaker's voice is distinct and clear, providing a rich auditory experience. The node also supports various configurations, such as adjusting the sampling temperature for varied outputs and adding pauses between speakers for natural pacing. This makes it a powerful tool for AI artists looking to create dynamic and engaging audio content.
KugelAudio Multi-Speaker Input Parameters:
speaker1_voice
This parameter allows you to provide a voice sample for Speaker 1. The sample can be in any sample rate and can be either mono or stereo. This input is optional, but providing a sample can help in generating a more personalized and distinct voice for the speaker.
speaker2_voice
Similar to speaker1_voice, this parameter is for Speaker 2's voice sample. It accepts any sample rate and mono/stereo formats. Providing a sample is optional but recommended for better voice differentiation.
speaker3_voice
This parameter is for Speaker 3's voice sample, accepting any sample rate and mono/stereo formats. It is optional but can enhance the uniqueness of the speaker's voice.
speaker4_voice
For Speaker 4, this parameter allows you to input a voice sample. It supports any sample rate and mono/stereo formats. While optional, it can contribute to a more personalized audio output.
speaker5_voice
This parameter is for Speaker 5's voice sample, accepting any sample rate and mono/stereo formats. Providing a sample is optional but can improve the distinctiveness of the speaker's voice.
speaker6_voice
For Speaker 6, this parameter allows you to input a voice sample. It supports any sample rate and mono/stereo formats. While optional, it can enhance the uniqueness of the speaker's voice.
seed
This integer parameter sets the random seed for reproducible audio generation. It has a default value of 42, with a range from 0 to 2^32 - 1. Adjusting the seed can help in achieving consistent results across different runs.
text
This string parameter is where you input the dialogue for the speakers. It supports multiline text and follows the format Speaker N: text (N=1-6). The default text provides an example of how to structure the input. This parameter is crucial for defining what each speaker will say.
model
This parameter allows you to select the audio model from the available options in ComfyUI/models/kugelaudio/. The model is auto-downloaded on the first run. Choosing the right model can significantly impact the quality and style of the generated audio.
attention_type
This parameter lets you choose the attention implementation, with options like SageAttention and FlashAttention, which require CUDA. The default is auto, which detects the best available option. This setting can affect the performance and quality of the audio processing.
use_4bit
This boolean parameter determines whether to quantize the language model to 4-bit using bitsandbytes, reducing VRAM usage from approximately 19GB to 8GB. It requires a CUDA GPU and is automatically disabled for CPU/MPS devices. This option can be useful for optimizing resource usage.
do_sample
This boolean parameter enables sampling for more varied audio outputs. When disabled, the output is deterministic. This setting can be useful for creating more dynamic and less predictable audio content.
temperature
This float parameter sets the sampling temperature, with a default value of 1.0 and a range from 0.1 to 2.0. It is only used if do_sample is set to True. Adjusting the temperature can influence the creativity and variability of the generated audio.
pause_between_speakers
This float parameter adds a pause between each speaker, with a default value of 0.2 seconds and a range from 0.0 to 2.0 seconds. This setting helps in creating a more natural pacing in the audio output.
disable_watermark
This boolean parameter allows you to disable audio watermarking, which can be useful if you experience stuttering or micro-freezes in the generated audio. Disabling the watermark can improve playback smoothness.
KugelAudio Multi-Speaker Output Parameters:
audio_output
The primary output of the node is the generated audio file, which contains the dialogue of all specified speakers. This output is crucial as it represents the final product of the node's processing, providing a multi-speaker audio narrative based on the input parameters.
KugelAudio Multi-Speaker Usage Tips:
- To achieve the best results, provide distinct voice samples for each speaker to enhance the uniqueness of their voices.
- Experiment with different models and attention types to find the best combination for your specific audio project.
- Use the
pause_between_speakersparameter to adjust the pacing of the dialogue, making it sound more natural and engaging.
KugelAudio Multi-Speaker Common Errors and Solutions:
"Model not found"
- Explanation: This error occurs when the specified model is not available in the
ComfyUI/models/kugelaudio/directory. - Solution: Ensure that the model is correctly downloaded and placed in the specified directory. You may need to run the node once to trigger the auto-download feature.
"CUDA device not available"
- Explanation: This error indicates that the node is attempting to use a CUDA-specific feature on a non-CUDA device.
- Solution: Check your device compatibility and ensure that CUDA is properly installed and configured. Alternatively, disable CUDA-specific features like
use_4bitif running on a CPU/MPS device.
"Invalid voice sample format"
- Explanation: This error occurs when the provided voice sample is in an unsupported format.
- Solution: Ensure that the voice samples are in a compatible format, such as any sample rate and mono/stereo audio files.
