Install this extension via the ComfyUI Manager by searching
for ComfyUI-MegaTTS
1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-MegaTTS in the search bar
After installation, click the Restart button to
restart ComfyUI. Then, manually
refresh your browser to clear the cache and access
the updated list of nodes.
Visit
ComfyUI Online
for ready-to-use ComfyUI environment
ComfyUI-MegaTTS is a custom node for ComfyUI, leveraging ByteDance MegaTTS3 to deliver high-quality text-to-speech synthesis with voice cloning for Chinese and English languages.
ComfyUI-MegaTTS Introduction
ComfyUI-MegaTTS is an innovative extension designed to bring high-quality text-to-speech (TTS) capabilities to AI artists. Built on ByteDance's MegaTTS3, this extension allows you to convert text into natural-sounding speech in both English and Chinese. It also offers voice cloning features, enabling you to replicate any voice using just a short audio sample. This tool is particularly useful for artists looking to add a vocal element to their projects, whether it's for creating voiceovers, character voices, or any other creative audio content.
How ComfyUI-MegaTTS Works
At its core, ComfyUI-MegaTTS uses advanced machine learning models to transform written text into spoken words. It leverages a diffusion transformer model, which is a type of neural network that excels at generating high-quality audio. Think of it as a sophisticated artist that listens to your text and paints a picture of sound, capturing the nuances of human speech. The extension also includes a voice cloning feature, which works by analyzing a short audio sample to capture the unique characteristics of a voice, allowing it to mimic that voice in new speech outputs.
ComfyUI-MegaTTS Features
High-Quality Speech Synthesis: Converts text into smooth, natural-sounding speech.
Voice Cloning: Clone any voice using a short sample, requiring both WAV and NPY files.
Bilingual Support: Seamlessly switch between English and Chinese, with code-switching capabilities.
Advanced Parameter Control: Fine-tune the quality, pronunciation accuracy, and voice similarity to suit your needs.
Memory Management: Optimizes GPU resource usage to prevent memory shortages, especially for users with limited GPU memory.
Automatic Model Download: Automatically downloads necessary models when needed, simplifying the setup process.
ComfyUI-MegaTTS Models
ComfyUI-MegaTTS utilizes a modified version of the MegaTTS3 model, which is organized into several components:
Diffusion Transformer: Handles the main TTS process.
WavVAE: Compresses and reconstructs audio, though currently unavailable for direct use.
Duration and Aligner Models: Ensure accurate timing and alignment of speech.
G2P (Grapheme-to-Phoneme): Converts written text into phonetic representations.
Each model plays a crucial role in ensuring the generated speech is both accurate and natural.
What's New with ComfyUI-MegaTTS
Version 1.0.2
Code and custom nodes have been restructured for better performance and GPU resource management.
Enhanced memory management to prevent memory shortages for users with lower GPU memory.
Added internationalization support for English and Chinese.
Version 1.0.1
Bug fixes to improve stability and performance.
Troubleshooting ComfyUI-MegaTTS
If you encounter issues while using ComfyUI-MegaTTS, here are some common problems and solutions:
Model Download Issues: Ensure you have a stable internet connection. If automatic downloads fail, manually download models from Hugging Face.
Voice Cloning Errors: Make sure your WAV and NPY files are correctly placed in the Voices folder and named consistently.
Memory Errors: Try reducing the generation quality or using a GPU with more memory.
Frequently Asked Questions
How do I improve voice similarity? Adjust the voice_similarity parameter to a higher value for closer resemblance to the reference voice.
Can I use my own voice samples? Yes, you can submit your samples to the Voice Submission Queue for processing.
Learn More about ComfyUI-MegaTTS
For further learning and support, consider exploring the following resources:
Community Forums: Engage with other AI artists and developers to share tips and solutions.
These resources provide a wealth of information to help you make the most of ComfyUI-MegaTTS in your creative projects.
RunComfy is the
premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.
RunComfy also provides AI Playground,
enabling artists to harness the latest AI tools to create incredible art.