ComfyUI-QwenTTS Introduction
ComfyUI-QwenTTS is an extension designed to enhance your text-to-speech (TTS) capabilities within the ComfyUI environment. This extension provides custom nodes for Qwen3-TTS, allowing you to create, design, and clone voices with ease. Whether you're an AI artist looking to add a unique voice to your digital creations or someone interested in experimenting with voice synthesis, ComfyUI-QwenTTS offers a user-friendly solution. It supports multiple languages and provides practical defaults for stability and speed across various platforms, including CUDA, Apple Silicon (MPS), and CPU.
How ComfyUI-QwenTTS Works
At its core, ComfyUI-QwenTTS transforms written text into spoken words using advanced machine learning models. Imagine it as a digital storyteller that can read your scripts aloud in a variety of voices. The extension uses pre-trained models to generate speech, which can be customized through different nodes. These nodes allow you to select from preset voices, design new ones using natural language descriptions, or even clone a voice from an audio sample. The process is akin to having a virtual voice actor at your disposal, capable of delivering lines in different styles and languages.
ComfyUI-QwenTTS Features
- Custom Voice: Choose from nine high-quality preset voices to quickly generate speech. This feature is perfect for those who want to get started without diving into complex settings.
- Voice Design: Create unique voices by describing them in natural language. This feature allows for creative freedom, enabling you to craft a voice that matches your artistic vision.
- Voice Clone: Clone a voice from an existing audio sample and transcript. This is useful for replicating a specific voice or creating a consistent character voice across projects.
- Multi-Device Support: Automatically selects the best processing unit available (CUDA, MPS, or CPU) to ensure optimal performance.
- Local-First Models: Prioritizes using locally stored models to reduce latency and improve reliability.
- Advanced Control Nodes: Offers detailed control over the speech generation process, including sampling methods, token limits, and attention mechanisms.
ComfyUI-QwenTTS Models
ComfyUI-QwenTTS utilizes several models, each tailored for specific tasks:
- CustomVoice (1.7B and 0.6B): Provides a range of premium timbres and supports style control. Ideal for high-quality voice synthesis.
- VoiceDesign (1.7B): Allows for voice creation from descriptions, offering flexibility in voice design.
- Base (1.7B and 0.6B): Focuses on rapid voice cloning, suitable for quick and efficient voice replication.
- Tokenizer (12Hz): Handles the encoding and decoding of speech, ensuring accurate text-to-speech conversion. These models are automatically downloaded and stored in a consistent directory structure, ensuring easy access and management.
What's New with ComfyUI-QwenTTS
Update (v1.1.4)
- Integration with the newly released ComfyUI-QwenASR, enabling seamless workflows between speech recognition and synthesis.
What's New (v1.1.0)
- Enhanced Voice Clone functionality with reusable voice inputs.
- Introduction of new tools such as Create Voice, Load Voice, Whisper STT, and Voice Instruct presets.
- Advanced nodes now offer more control over attention mechanisms.
- Improved Audio Duration node for more precise timing and output.
Troubleshooting ComfyUI-QwenTTS
Here are some common issues and their solutions:
- 'Qwen3TTSTalkerConfig' object has no attribute 'pad_token_id': This error is often due to an incompatible version of the
transformerslibrary. To fix it, install the recommended version: bash pip install -U "transformers==4.57.3" "tokenizers<0.20" --no-cache-dir
Then restart ComfyUI.
- Output is too long or contains humming: Adjust the
max_new_tokenssetting to a lower value (e.g., 512–1024) and setdo_sample=Falsefor more stable results. - CUDA Out of Memory (OOM) Error: Break long scripts into smaller chunks, reduce
max_new_tokens, and consider usingprecision=bf16to manage memory usage better.
Learn More about ComfyUI-QwenTTS
For further exploration and support, consider the following resources:
- Tutorials and Documentation: Explore detailed guides and documentation to deepen your understanding of ComfyUI-QwenTTS.
- Community Forums: Join discussions with other AI artists and developers to share insights and seek advice.
- Example Workflows: Review example workflows provided in the extension to see practical applications and get inspired. By leveraging these resources, you can enhance your projects with sophisticated voice synthesis capabilities, bringing your AI art to life with sound.
