ComfyUI-OmniVoice-TTS detailed guide

How to Install ComfyUI-OmniVoice-TTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-OmniVoice-TTS

1. Click the Manager button in the main menu

2. Select Custom Nodes Manager button

3. Enter ComfyUI-OmniVoice-TTS in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available

16GB VRAM to 80GB VRAM GPU machines

400+ preloaded models/nodes

Freedom to upload custom models/nodes

200+ ready-to-run workflows

100% private workspace with up to 200GB storage

Dedicated Support

ComfyUI-OmniVoice-TTS Description

ComfyUI-OmniVoice-TTS is a text-to-speech extension for ComfyUI, enabling users to convert written text into natural-sounding speech. It supports multiple languages and voices, enhancing accessibility and user interaction.

ComfyUI-OmniVoice-TTS Introduction

ComfyUI-OmniVoice-TTS is an advanced extension designed to bring the power of text-to-speech (TTS) technology to AI artists. This extension allows you to generate high-quality speech from text in over 600 languages, making it one of the most versatile TTS tools available. Whether you're looking to clone a voice from a short audio sample or design a completely new voice using textual descriptions, ComfyUI-OmniVoice-TTS has you covered. It supports voice cloning, voice design, and multi-speaker dialogues, providing a comprehensive solution for creating diverse and expressive audio content.

How ComfyUI-OmniVoice-TTS Works

At its core, ComfyUI-OmniVoice-TTS uses a diffusion language model architecture to convert text into speech. This model works by iteratively refining the audio output, similar to how an artist might start with a rough sketch and gradually add details to create a finished piece. The extension can clone voices by analyzing a short reference audio clip and then using that analysis to generate new speech in the same voice. For voice design, it allows you to specify attributes like gender, age, and accent to create a custom voice without needing a reference audio. The extension also supports non-verbal expressions and pronunciation adjustments, making it a flexible tool for creating nuanced audio content.

ComfyUI-OmniVoice-TTS Features

Multilingual Support: Generate speech in over 600 languages, making it ideal for global projects.
Voice Cloning: Clone any voice using a 3-15 second audio sample, perfect for creating consistent character voices.
Voice Design: Create unique voices by specifying attributes such as gender, age, pitch, and accent.
Multi-Speaker Dialogues: Use [Speaker_N]: tags to generate conversations between multiple speakers.
Fast Inference: Achieve real-time performance with a response time factor as low as 0.025.
Non-Verbal Expressions: Add expressions like laughter or sighs directly into the text for more dynamic audio.
Automatic Model Download: Models are automatically downloaded from HuggingFace when first used, simplifying setup.
Efficient Memory Usage: Features like automatic CPU offloading and smart caching help manage memory effectively.

ComfyUI-OmniVoice-TTS Models

ComfyUI-OmniVoice-TTS offers different models to suit various needs:

OmniVoice: A full precision model (~4GB) supporting over 600 languages, ideal for high-quality output.
OmniVoice-bf16: A bfloat16 quantized model (~2GB) that uses less memory, suitable for environments with limited resources. Additionally, Whisper models are available for automatic speech recognition, which can be used to transcribe reference audio for voice cloning.

Troubleshooting ComfyUI-OmniVoice-TTS

Here are some common issues and solutions:

Model Download Failures: If you're in China, set the HuggingFace mirror before starting ComfyUI: export HF_ENDPOINT="https://hf-mirror.com".
Whisper Model Re-downloads: Connect the OmniVoice Whisper Loader to the whisper_model input to cache the model.
CUDA Memory Errors: Try setting keep_model_loaded = False, using dtype = fp16 or bf16, or switching to device = cpu.
Import Errors After Installation: Restart ComfyUI to reload Python modules.
Transformers Version Issues: Ensure you have transformers>=5.3.0. Upgrade if necessary, but be cautious as it may affect other nodes. For more detailed troubleshooting, refer to the troubleshooting guide.

Learn More about ComfyUI-OmniVoice-TTS

To further explore the capabilities of ComfyUI-OmniVoice-TTS, you can visit the Hugging Face Space for demos and additional resources. The GitHub repository is also a valuable resource for updates and community support. For a deeper dive into the technical aspects, the arXiv paper provides an in-depth look at the underlying technology.

ComfyUI Extension: ComfyUI-OmniVoice-TTS