ComfyUI > Nodes > ComfyUI-MegaTTS

ComfyUI Extension: ComfyUI-MegaTTS

Repo Name

ComfyUI-MegaTTS

Author
1038lab (Account age: 774 days)
Nodes
View all nodes(3)
Latest Updated
2025-04-13
Github Stars
0.03K

How to Install ComfyUI-MegaTTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-MegaTTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-MegaTTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI-MegaTTS Description

ComfyUI-MegaTTS is a custom node for ComfyUI, leveraging ByteDance MegaTTS3 to deliver high-quality text-to-speech synthesis with voice cloning for Chinese and English languages.

ComfyUI-MegaTTS Introduction

ComfyUI-MegaTTS is an innovative extension designed to bring high-quality text-to-speech (TTS) capabilities to AI artists. Built on ByteDance's MegaTTS3, this extension allows you to convert text into natural-sounding speech in both English and Chinese. It also offers voice cloning features, enabling you to replicate any voice using just a short audio sample. This tool is particularly useful for artists looking to add a vocal element to their projects, whether it's for creating voiceovers, character voices, or any other creative audio content.

How ComfyUI-MegaTTS Works

At its core, ComfyUI-MegaTTS uses advanced machine learning models to transform written text into spoken words. It leverages a diffusion transformer model, which is a type of neural network that excels at generating high-quality audio. Think of it as a sophisticated artist that listens to your text and paints a picture of sound, capturing the nuances of human speech. The extension also includes a voice cloning feature, which works by analyzing a short audio sample to capture the unique characteristics of a voice, allowing it to mimic that voice in new speech outputs.

ComfyUI-MegaTTS Features

  • High-Quality Speech Synthesis: Converts text into smooth, natural-sounding speech.
  • Voice Cloning: Clone any voice using a short sample, requiring both WAV and NPY files.
  • Bilingual Support: Seamlessly switch between English and Chinese, with code-switching capabilities.
  • Advanced Parameter Control: Fine-tune the quality, pronunciation accuracy, and voice similarity to suit your needs.
  • Memory Management: Optimizes GPU resource usage to prevent memory shortages, especially for users with limited GPU memory.
  • Automatic Model Download: Automatically downloads necessary models when needed, simplifying the setup process.

ComfyUI-MegaTTS Models

ComfyUI-MegaTTS utilizes a modified version of the MegaTTS3 model, which is organized into several components:

  • Diffusion Transformer: Handles the main TTS process.
  • WavVAE: Compresses and reconstructs audio, though currently unavailable for direct use.
  • Duration and Aligner Models: Ensure accurate timing and alignment of speech.
  • G2P (Grapheme-to-Phoneme): Converts written text into phonetic representations. Each model plays a crucial role in ensuring the generated speech is both accurate and natural.

What's New with ComfyUI-MegaTTS

Version 1.0.2

  • Code and custom nodes have been restructured for better performance and GPU resource management.
  • Enhanced memory management to prevent memory shortages for users with lower GPU memory.
  • Added internationalization support for English and Chinese.

Version 1.0.1

  • Bug fixes to improve stability and performance.

Troubleshooting ComfyUI-MegaTTS

If you encounter issues while using ComfyUI-MegaTTS, here are some common problems and solutions:

  • Model Download Issues: Ensure you have a stable internet connection. If automatic downloads fail, manually download models from Hugging Face.
  • Voice Cloning Errors: Make sure your WAV and NPY files are correctly placed in the Voices folder and named consistently.
  • Memory Errors: Try reducing the generation quality or using a GPU with more memory.

Frequently Asked Questions

  • How do I improve voice similarity? Adjust the voice_similarity parameter to a higher value for closer resemblance to the reference voice.
  • Can I use my own voice samples? Yes, you can submit your samples to the Voice Submission Queue for processing.

Learn More about ComfyUI-MegaTTS

For further learning and support, consider exploring the following resources:

  • MegaTTS3 GitHub Repository: ByteDance/MegaTTS3
  • Hugging Face Model Page: ByteDance/MegaTTS3
  • Community Forums: Engage with other AI artists and developers to share tips and solutions. These resources provide a wealth of information to help you make the most of ComfyUI-MegaTTS in your creative projects.

ComfyUI-MegaTTS Related Nodes

RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.