ComfyUI_ASR Introduction
ComfyUI_ASR is an innovative extension designed to enhance your AI art projects by integrating advanced speech recognition and subtitle processing capabilities. This extension is a collection of custom nodes for ComfyUI, which allows you to effortlessly add subtitles to your videos. Whether you're working with English, Chinese, or other languages, ComfyUI_ASR provides reliable speech recognition and subtitle generation, making it an invaluable tool for AI artists looking to add a new dimension to their video content. With features like static and dynamic subtitles, customizable font settings, and color options, this extension solves the common problem of manually adding subtitles, saving you time and effort.
How ComfyUI_ASR Works
At its core, ComfyUI_ASR leverages advanced speech recognition models to convert audio into text, which can then be used to generate subtitles. Imagine it as a smart assistant that listens to your video's audio and transcribes it into text, complete with timestamps. This text is then used to create subtitles that can be displayed in your video. The extension offers two types of subtitles: static, where entire sentences appear at once, and dynamic, where words appear one by one, mimicking a typewriter effect. This flexibility allows you to choose the style that best fits your artistic vision.
ComfyUI_ASR Features
Speech Recognition
- ASRMW Node: Converts audio into text and timestamps. You can choose from various models, such as Belle-whisper-large-v3-zh-punct-ct2, to suit your language needs. The node outputs plain text and timestamped words or sentences.
Subtitle Generation
- StaticSubtitlesToVideoMW Node: Adds static subtitles to your video, displaying complete sentences. Customize font size, color, background, and alignment to match your video's aesthetic.
- DynamicSubtitlesToVideoMW Node: Creates dynamic subtitles that appear word by word. This feature is perfect for creating engaging, typewriter-style effects.
Customization Options
- Font and Color Settings: Adjust font size, color, background color, and transparency. You can also add outlines to your text for better visibility.
- Alignment and Positioning: Subtitles can be aligned left, center, or right, and positioned anywhere on the screen to ensure they complement your video content.
Color Picker
- ColorPickerMW Node: A simple tool to select colors for your subtitles, ensuring they stand out against your video background.
ComfyUI_ASR Models
ComfyUI_ASR supports several models for speech recognition, each tailored for different languages and performance needs:
- Belle-whisper-large-v3-zh-punct-ct2: Ideal for Chinese language recognition.
- Belle-whisper-large-v3-zh-punct-ct2-float32: Offers a balance between performance and precision.
- faster-whisper-large-v3-turbo-ct2: Provides faster processing times, suitable for large projects. Choosing the right model depends on your specific requirements, such as language and processing speed.
What's New with ComfyUI_ASR
- [2026-02-09]: Introduced SRT subtitle output for faster loading in video players, addressing the slow subtitle addition in long videos.
- [2025-11-02]: Version 1.0.2 fixed non-integer stroke width issues and added automatic model download for first-time users.
- [2025-11-01]: Initial release of version 1.0.0, bringing comprehensive subtitle and speech recognition features.
Troubleshooting ComfyUI_ASR
Common Issues and Solutions
- Model Download Issues: If models do not download automatically, manually download them and place them in the
ComfyUI/models/TTSdirectory. - Subtitle Alignment Problems: Ensure that font files are placed in the
fontsdirectory within the node folder. - Performance Lag: For large videos, consider using the faster-whisper model for improved processing speed.
FAQs
- Can I use my own fonts? Yes, simply place your font files in the
fontsdirectory. - How do I adjust subtitle timing? Use the timestamped outputs from the ASRMW node to fine-tune subtitle timing.
Learn More about ComfyUI_ASR
To further explore the capabilities of ComfyUI_ASR, consider visiting community forums and tutorials that provide insights and tips on maximizing the extension's potential. Engaging with other AI artists can also offer new perspectives and creative ideas for your projects.
