Install this extension via the ComfyUI Manager by searching
for ComfyUI-EdgeTTS
1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-EdgeTTS in the search bar
After installation, click the Restart button to
restart ComfyUI. Then, manually
refresh your browser to clear the cache and access
the updated list of nodes.
Visit
ComfyUI Online
for ready-to-use ComfyUI environment
ComfyUI-EdgeTTS is a text-to-speech node for ComfyUI that utilizes Microsoft's Edge TTS to convert text into natural-sounding speech. It supports multiple languages and voices, enhancing user interactions with easy integration and customization.
ComfyUI-EdgeTTS Introduction
ComfyUI-EdgeTTS is an innovative extension designed to transform text into natural-sounding speech using Microsoft's Edge Text-to-Speech (TTS) technology. This extension is particularly beneficial for AI artists who wish to incorporate voice elements into their projects, offering a seamless way to enhance user interactions with high-quality audio outputs. With support for multiple languages and a variety of voices, ComfyUI-EdgeTTS is versatile and easy to customize, making it suitable for a wide range of applications, from digital art installations to interactive storytelling.
How ComfyUI-EdgeTTS Works
At its core, ComfyUI-EdgeTTS leverages the capabilities of Microsoft's Edge TTS service to convert written text into spoken words. Imagine it as a digital narrator that can read your text aloud in a variety of voices and languages. The extension integrates into the ComfyUI environment, allowing you to input text, select a voice, and adjust parameters like speech rate and pitch to suit your artistic vision. This process is akin to directing a voice actor, where you have control over how the final audio output sounds.
ComfyUI-EdgeTTS Features
Edge TTS Node
Text-to-Speech Conversion: Converts text into speech using Microsoft Edge TTS, supporting a wide array of languages and voices.
Customizable Speech Parameters: Adjust the speech rate and pitch to create the desired vocal effect, whether it's a fast-paced narration or a slow, dramatic reading.
High-Quality Synthesis: Produces clear and natural-sounding audio, enhancing the auditory experience of your projects.
Configuration Options: Easily customize settings through a config.json file to tailor the extension to your specific needs.
Speech to Text Node
Whisper STT: Utilizes OpenAI's Whisper model for accurate speech recognition, supporting multiple languages with automatic detection.
Model Variety: Choose from different model sizes, from tiny to large, to balance between speed and accuracy based on your hardware capabilities.
Confidence Reporting: Provides feedback on language detection confidence, helping you ensure the accuracy of transcriptions.
Audio File Node
Audio Export: Save your audio creations in various formats, including WAV, MP3, and FLAC.
Quality Presets: Select from high, medium, or low quality to match your project's requirements.
Custom File Management: Define file names and paths, with automatic numbering to keep your audio files organized.
ComfyUI-EdgeTTS Models
ComfyUI-EdgeTTS supports a range of voices across different languages, each with unique characteristics. For instance, you can choose between friendly, authoritative, or lively tones, depending on the context of your project. The extension includes voices for languages such as English, Chinese, Japanese, and many more, each offering male and female options to provide flexibility in your audio design.
What's New with ComfyUI-EdgeTTS
Version 1.1.0 (2025-01-24)
Expanded Language Support: Added 19 new languages and 38 new voices, broadening the scope of your creative possibilities.
Enhanced Voice Characteristics: Improved details for existing Chinese voices, offering more nuanced vocal expressions. For a complete list of updates, please refer to the update log.
Troubleshooting ComfyUI-EdgeTTS
If you encounter issues while using ComfyUI-EdgeTTS, here are some common problems and solutions:
Audio Quality Issues: Ensure that your configuration settings in config.json are correctly set for high-quality output. Adjust the speech rate and pitch if the audio sounds unnatural.
Language Detection Errors: If the Whisper STT node is not detecting the correct language, try manually selecting the language instead of relying on auto-detection.
Installation Problems: Verify that all required Python packages are installed as per the requirements.txt file. Ensure your system meets the necessary hardware requirements, especially if using Whisper's larger models.
Learn More about ComfyUI-EdgeTTS
To further explore the capabilities of ComfyUI-EdgeTTS, consider the following resources:
OpenAI Whisper: Discover the speech recognition model used in this extension by visiting the OpenAI Whisper GitHub page.
Community Forums: Engage with other AI artists and developers in community forums to share experiences, ask questions, and find inspiration for your projects.
By utilizing these resources, you can maximize the potential of ComfyUI-EdgeTTS in your creative endeavors.
RunComfy is the
premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.
RunComfy also provides AI Playground,
enabling artists to harness the latest AI tools to create incredible art.