ComfyUI > Nodes > ComfyUI-KugelAudio

ComfyUI Extension: ComfyUI-KugelAudio

Repo Name

ComfyUI-KugelAudio

Author
Saganaki22 (Account age: 0 days)
Nodes
View all nodes(4)
Latest Updated
2026-02-28
Github Stars
0.03K

How to Install ComfyUI-KugelAudio

Install this extension via the ComfyUI Manager by searching for ComfyUI-KugelAudio
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-KugelAudio in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI-KugelAudio Description

ComfyUI-KugelAudio is an extension for ComfyUI that integrates audio processing capabilities, enabling users to manipulate and analyze sound within the ComfyUI environment.

ComfyUI-KugelAudio Introduction

ComfyUI-KugelAudio is an innovative extension designed to enhance the capabilities of ComfyUI by integrating advanced text-to-speech (TTS) functionalities. This extension leverages the power of an AR (Auto-Regressive) and Diffusion architecture to provide open-source TTS with voice cloning capabilities across 24 European languages. Whether you're an AI artist looking to add realistic voiceovers to your projects or exploring new creative avenues, ComfyUI-KugelAudio offers a robust solution for generating high-quality, natural-sounding speech from text.

How ComfyUI-KugelAudio Works

At its core, ComfyUI-KugelAudio transforms written text into spoken words using a sophisticated model that combines AR and Diffusion techniques. The AR component predicts the next word in a sequence, while the Diffusion model refines the audio output to ensure clarity and naturalness. This dual approach allows the extension to produce speech that closely mimics human intonation and rhythm. By using reference audio samples, the extension can also clone voices, enabling users to replicate specific vocal characteristics in their TTS outputs.

ComfyUI-KugelAudio Features

  • Single Speaker TTS: Converts text into speech with a single voice, ideal for narrations or monologues.
  • Voice Cloning: Allows you to clone any voice using a short audio sample (5-30 seconds), making it possible to personalize the TTS output with unique vocal traits.
  • Multi-Speaker Conversations: Supports up to 6 speakers, enabling the creation of dynamic dialogues with configurable pauses between speakers for natural pacing.
  • Watermark Detection: Ensures all generated audio contains an inaudible watermark, providing a layer of authenticity and security.
  • Language Support: Offers TTS in 24 European languages, including English, German, French, and Spanish, among others.
  • 4-bit Quantization: Reduces VRAM usage from approximately 19GB to 8GB, making it more accessible for users with limited hardware resources.
  • Multiple Attention Types: Provides various attention mechanisms like Auto, SageAttention, and FlashAttention to optimize performance and quality.
  • Progress Tracking: Displays real-time progress bars for long text generations, keeping you informed of the process.

ComfyUI-KugelAudio Models

ComfyUI-KugelAudio utilizes a model known as kugelaudio-0-open, which consists of 7 billion parameters. This model is designed to deliver high-quality audio output while maintaining efficient performance. The model automatically downloads upon first use, ensuring a seamless setup experience.

What's New with ComfyUI-KugelAudio

Recent updates have focused on enhancing the user experience and expanding the extension's capabilities. Key improvements include the introduction of multi-speaker support, allowing for more complex audio productions, and the implementation of 4-bit quantization to reduce VRAM requirements. These updates make the extension more versatile and accessible to a broader range of users.

Troubleshooting ComfyUI-KugelAudio

Common Issues and Solutions

  • Voice Cloning Errors: If you encounter an error related to 'Qwen2Config', ensure you run the install_portable.bat script in the ComfyUI-KugelAudio directory.
  • Out of Memory (OOM) Errors: Enable 4-bit quantization to reduce VRAM usage, use SDPA or Eager attention types, and consider reducing the max_words_per_chunk setting.
  • Model Download Failures: Verify your internet connection and try downloading the model manually using the Hugging Face CLI.
  • Audio Quality Issues: Adjust the cfg_scale setting to improve clarity and reduce distortion. For static or noise, disable 4-bit quantization.

Learn More about ComfyUI-KugelAudio

To further explore the capabilities of ComfyUI-KugelAudio, consider visiting the GitHub Repository for detailed documentation and updates. Additionally, the Hugging Face Model Page provides access to the model and related resources. Engaging with community forums and tutorials can also offer valuable insights and support as you integrate this extension into your creative projects.

ComfyUI-KugelAudio Related Nodes

RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

ComfyUI-KugelAudio detailed guide | ComfyUI