ComfyUI > Nodes > VibeVoice ComfyUI

ComfyUI Extension: VibeVoice ComfyUI

Repo Name

VibeVoice-ComfyUI

Author
Fabio Sarracino (Account age: 110 days)
Nodes
View all nodes(5)
Latest Updated
2025-10-02
Github Stars
1.25K

How to Install VibeVoice ComfyUI

Install this extension via the ComfyUI Manager by searching for VibeVoice ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter VibeVoice ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

VibeVoice ComfyUI Description

VibeVoice ComfyUI is a ComfyUI wrapper for the Microsoft VibeVoice TTS model, enabling single and multi-speaker support and text file loading for enhanced text-to-speech functionality.

VibeVoice-ComfyUI Introduction

VibeVoice-ComfyUI is an extension designed to integrate Microsoft's VibeVoice text-to-speech (TTS) model into ComfyUI workflows. This extension allows you to generate high-quality, expressive speech from text, supporting both single and multi-speaker scenarios. Whether you're creating podcasts, voiceovers, or any other audio content, VibeVoice-ComfyUI provides a seamless way to synthesize natural-sounding speech directly within your creative projects. It addresses common challenges in TTS, such as maintaining speaker consistency and handling long-form content, making it an invaluable tool for AI artists looking to enhance their audio productions.

How VibeVoice-ComfyUI Works

At its core, VibeVoice-ComfyUI leverages the VibeVoice model, which uses advanced speech tokenizers and a diffusion framework to generate speech. Think of it as a sophisticated storyteller that can read your text and bring it to life with voices that sound natural and engaging. The model understands the context and flow of dialogue, allowing it to produce speech that feels coherent and dynamic. By using a combination of language models and acoustic processing, VibeVoice-ComfyUI can handle complex tasks like multi-speaker conversations and voice cloning, where it mimics the characteristics of a given voice sample.

VibeVoice-ComfyUI Features

Core Functionality

  • Single Speaker TTS: Generate speech from text using a single voice, with the option to clone a specific voice from an audio sample.
  • Multi-Speaker Conversations: Create dialogues with up to four distinct speakers, each with their own voice.
  • Voice Cloning: Capture the essence of a voice from an audio sample and use it to generate new speech.
  • LoRA Support: Fine-tune voices with custom Low-Rank Adaptation (LoRA) adapters for personalized voice characteristics.
  • Voice Speed Control: Adjust the speaking rate to match your desired pace.
  • Text File Loading: Easily load scripts from text files for processing.
  • Automatic Text Chunking: Seamlessly handle long texts by breaking them into manageable chunks.
  • Custom Pause Tags: Insert pauses in speech to control pacing and emphasis.
  • Node Chaining: Connect multiple nodes to create complex workflows.
  • Interruption Support: Cancel operations at any point during the generation process.
  • Flexible Configuration: Customize parameters like temperature, sampling, and guidance scale to suit your needs.

Performance & Optimization

  • Attention Mechanisms: Choose from various attention types to optimize performance.
  • Diffusion Steps: Balance quality and speed by adjusting the number of processing steps.
  • Memory Management: Efficiently manage VRAM usage with automatic cleanup options.
  • Apple Silicon Support: Enjoy native GPU acceleration on Apple devices with M1/M2/M3 chips.
  • Quantization Options: Reduce VRAM usage with 8-bit and 4-bit quantization, maintaining audio quality.

VibeVoice-ComfyUI Models

VibeVoice-ComfyUI supports several models, each suited for different use cases:

  • VibeVoice-1.5B: A smaller model ideal for quick prototyping and single-speaker tasks, requiring around 6GB of VRAM.
  • VibeVoice-Large: Offers the highest quality for multi-speaker conversations, but requires more VRAM (~20GB).
  • VibeVoice-Large-Q8: Provides production-quality audio with reduced VRAM usage (~12GB), perfect for GPUs with 12GB VRAM.
  • VibeVoice-Large-Q4: Maximizes VRAM savings with minimal quality loss, suitable for lower-end GPUs. Each model can be downloaded from HuggingFace, and they are automatically detected and managed within the ComfyUI environment.

What's New with VibeVoice-ComfyUI

Recent updates have introduced several enhancements:

  • Version 1.8.1: Fixed a critical bug in the bitsandbytes library affecting the Q8 model.
  • Version 1.8.0: Introduced the VibeVoice-Large-Q8 model, offering perfect audio quality with significant VRAM savings.
  • Version 1.7.0: Added dynamic 4-bit quantization for language models, improving speed and reducing VRAM usage.
  • Version 1.6.0: Removed automatic model downloading, giving users more control over model management. These updates improve the extension's performance and flexibility, making it easier for AI artists to create high-quality audio content.

Troubleshooting VibeVoice-ComfyUI

If you encounter issues while using VibeVoice-ComfyUI, here are some common solutions:

  • Installation Issues: Ensure you're using the correct Python environment and restart ComfyUI after installation.
  • Generation Problems: For unstable voices, try using deterministic mode. Ensure multi-speaker text is formatted correctly with sequential speaker numbers.
  • Memory Constraints: Use smaller models like VibeVoice-1.5B for systems with limited VRAM. For more detailed troubleshooting, refer to the ComfyUI logs and ensure all dependencies are correctly installed.

Learn More about VibeVoice-ComfyUI

To further explore VibeVoice-ComfyUI, consider the following resources:

  • Video Demo: Watch a demonstration of the extension in action here.
  • Project Page: Visit the VibeVoice Project Page for more examples and technical details.
  • Community Forums: Join discussions and seek support from other AI artists and developers. These resources provide valuable insights and support to help you make the most of VibeVoice-ComfyUI in your creative projects.

VibeVoice ComfyUI Related Nodes

RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.