RunComfy

ReActor | Fast Face Swap

Professional face swapping toolkit for ComfyUI that enables natural face replacement and enhancement.

Face Detailer | Fix Faces

Use Face Detailer first for facial restoration, followed by the 4x UltraSharp Model for superior upscaling.

Wan 2.2 | Open-Source Video Gen Leader

Available now! Better precision + smoother motion.

FLUX Kontext Face Swap | Seamless Face Replacement

Photoreal face replacement with prompt-guided control and natural blending

ComfyUI > Nodes > ComfyUI-Qwen3-ASR

ComfyUI Extension: ComfyUI-Qwen3-ASR

Repo Name

ComfyUI-Qwen3-ASR

Author
kaushiknishchay (Account age: 3782 days) Nodes
View all nodes(2) Latest Updated
2026-03-05 Github Stars
0.01K

Github Ask kaushiknishchay Current Questions Past Questions

Table of Content

Description
ComfyUI-Qwen3-ASR Introduction
How ComfyUI-Qwen3-ASR Works
ComfyUI-Qwen3-ASR Features
ComfyUI-Qwen3-ASR Models
Troubleshooting ComfyUI-Qwen3-ASR
Learn More about ComfyUI-Qwen3-ASR
Related Nodes

How to Install ComfyUI-Qwen3-ASR

Install this extension via the ComfyUI Manager by searching for ComfyUI-Qwen3-ASR

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-Qwen3-ASR in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI-Qwen3-ASR Description

ComfyUI-Qwen3-ASR provides ComfyUI nodes for Qwen3-ASR (0.6B/1.7B) and ForcedAligner, enabling high-accuracy ASR and language identification across 52 languages, including 22 Chinese dialects. It offers word-level timestamps, long audio transcription, and VRAM-optimized inference.

ComfyUI-Qwen3-ASR Introduction

ComfyUI-Qwen3-ASR is an advanced extension designed to integrate the Qwen3-ASR model family into the ComfyUI platform. This extension offers cutting-edge capabilities in converting spoken language into written text, identifying languages, and providing precise word-level timestamps. It leverages the innovative Qwen3 Forced Aligner to enhance transcription accuracy and timing precision. For AI artists, this means you can easily transcribe audio content into text, identify the language of the audio, and obtain detailed timing information for each word, which can be particularly useful for creating synchronized multimedia projects or analyzing spoken content.

How ComfyUI-Qwen3-ASR Works

At its core, ComfyUI-Qwen3-ASR uses sophisticated machine learning models to process audio inputs and convert them into text. Imagine it as a highly skilled translator who listens to a conversation and writes down exactly what is being said, in the correct language, and with precise timing for each word. The extension supports multiple languages and dialects, automatically detecting the language being spoken. It processes audio in chunks, ensuring that even long recordings are transcribed accurately. The use of FlashAttention 2 technology helps to reduce memory usage and speed up the transcription process, making it efficient even on less powerful hardware.

ComfyUI-Qwen3-ASR Features

High Accuracy: The extension supports two models, Qwen3-ASR 0.6B and 1.7B, which are trained to deliver high transcription accuracy.
Multilingual Support: It can handle 52 languages and dialects, automatically detecting the language of the audio input.
Word-Level Timestamps: By integrating with the Qwen3 Forced Aligner, it provides detailed timestamps for each word, which is optional but highly beneficial for precise synchronization.
Flexible Precision: Users can choose between bf16, fp16, and fp32 precision settings to balance between memory usage and processing speed.
Automatic Resampling: The extension automatically resamples audio to 16kHz, optimizing it for the models' performance.
FlashAttention 2: This feature significantly reduces VRAM usage and accelerates the inference process, making it faster and more efficient.

ComfyUI-Qwen3-ASR Models

The extension supports different models, each suited for specific needs:

Qwen3-ASR-1.7B: This model is ideal for tasks requiring the highest accuracy and can handle complex audio environments. It is suitable for professional-grade transcription tasks.
Qwen3-ASR-0.6B: This model offers a balance between accuracy and efficiency, making it suitable for less demanding tasks or when resources are limited.
Qwen3-ForcedAligner-0.6B: This model is used for generating word-level timestamps, enhancing the transcription with precise timing information.

Troubleshooting ComfyUI-Qwen3-ASR

Here are some common issues you might encounter and how to resolve them:

Python 3.13 Issues: If you experience an UnboundLocalError related to lazy_loader, update the package using: bash python.exe -m pip install -U lazy-loader
VRAM Usage: The 1.7B model requires 4-6GB of VRAM in bf16 mode. If you encounter memory issues, consider using the 0.6B model or switching to cpu mode.

Learn More about ComfyUI-Qwen3-ASR

To further explore the capabilities of ComfyUI-Qwen3-ASR, you can access additional resources such as tutorials, detailed documentation, and community forums. These resources can provide valuable insights and support, helping you make the most of this powerful extension. Visit the Qwen3-ASR GitHub repository for more information and to connect with other users and developers.

ComfyUI-Qwen3-ASR Related Nodes

Qwen3 ASR Transcriber

Qwen3 Forced Aligner Config

Table of Content

Description
ComfyUI-Qwen3-ASR Introduction
How ComfyUI-Qwen3-ASR Works
ComfyUI-Qwen3-ASR Features
ComfyUI-Qwen3-ASR Models
Troubleshooting ComfyUI-Qwen3-ASR
Learn More about ComfyUI-Qwen3-ASR
Related Nodes

Flux Kontext Character Turnaround Sheet LoRA

Generate 5-pose character turnaround sheets from single image

Qwen-Image | HD Multi-Text Poster Generator

New Era of Text Generation in Images!

Fantasy Portrait | Expressive Photo Animation

Photo → expressive cinematic face animation, fast and identity-accurate.

Wan2.2 VACE Fun | Image to Animated Video

Turn still photos into lifelike animated videos with custom prompts.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy