ComfyUI > Nodes > ComfyUI-Qwen3-ASR

ComfyUI Extension: ComfyUI-Qwen3-ASR

Repo Name

ComfyUI-Qwen3-ASR

Author
kaushiknishchay (Account age: 3782 days)
Nodes
View all nodes(2)
Latest Updated
2026-03-05
Github Stars
0.01K

How to Install ComfyUI-Qwen3-ASR

Install this extension via the ComfyUI Manager by searching for ComfyUI-Qwen3-ASR
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-Qwen3-ASR in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI-Qwen3-ASR Description

ComfyUI-Qwen3-ASR provides ComfyUI nodes for Qwen3-ASR (0.6B/1.7B) and ForcedAligner, enabling high-accuracy ASR and language identification across 52 languages, including 22 Chinese dialects. It offers word-level timestamps, long audio transcription, and VRAM-optimized inference.

ComfyUI-Qwen3-ASR Introduction

ComfyUI-Qwen3-ASR is an advanced extension designed to integrate the Qwen3-ASR model family into the ComfyUI platform. This extension offers cutting-edge capabilities in converting spoken language into written text, identifying languages, and providing precise word-level timestamps. It leverages the innovative Qwen3 Forced Aligner to enhance transcription accuracy and timing precision. For AI artists, this means you can easily transcribe audio content into text, identify the language of the audio, and obtain detailed timing information for each word, which can be particularly useful for creating synchronized multimedia projects or analyzing spoken content.

How ComfyUI-Qwen3-ASR Works

At its core, ComfyUI-Qwen3-ASR uses sophisticated machine learning models to process audio inputs and convert them into text. Imagine it as a highly skilled translator who listens to a conversation and writes down exactly what is being said, in the correct language, and with precise timing for each word. The extension supports multiple languages and dialects, automatically detecting the language being spoken. It processes audio in chunks, ensuring that even long recordings are transcribed accurately. The use of FlashAttention 2 technology helps to reduce memory usage and speed up the transcription process, making it efficient even on less powerful hardware.

ComfyUI-Qwen3-ASR Features

  • High Accuracy: The extension supports two models, Qwen3-ASR 0.6B and 1.7B, which are trained to deliver high transcription accuracy.
  • Multilingual Support: It can handle 52 languages and dialects, automatically detecting the language of the audio input.
  • Word-Level Timestamps: By integrating with the Qwen3 Forced Aligner, it provides detailed timestamps for each word, which is optional but highly beneficial for precise synchronization.
  • Flexible Precision: Users can choose between bf16, fp16, and fp32 precision settings to balance between memory usage and processing speed.
  • Automatic Resampling: The extension automatically resamples audio to 16kHz, optimizing it for the models' performance.
  • FlashAttention 2: This feature significantly reduces VRAM usage and accelerates the inference process, making it faster and more efficient.

ComfyUI-Qwen3-ASR Models

The extension supports different models, each suited for specific needs:

  • Qwen3-ASR-1.7B: This model is ideal for tasks requiring the highest accuracy and can handle complex audio environments. It is suitable for professional-grade transcription tasks.
  • Qwen3-ASR-0.6B: This model offers a balance between accuracy and efficiency, making it suitable for less demanding tasks or when resources are limited.
  • Qwen3-ForcedAligner-0.6B: This model is used for generating word-level timestamps, enhancing the transcription with precise timing information.

Troubleshooting ComfyUI-Qwen3-ASR

Here are some common issues you might encounter and how to resolve them:

  • Python 3.13 Issues: If you experience an UnboundLocalError related to lazy_loader, update the package using: bash python.exe -m pip install -U lazy-loader
  • VRAM Usage: The 1.7B model requires 4-6GB of VRAM in bf16 mode. If you encounter memory issues, consider using the 0.6B model or switching to cpu mode.

Learn More about ComfyUI-Qwen3-ASR

To further explore the capabilities of ComfyUI-Qwen3-ASR, you can access additional resources such as tutorials, detailed documentation, and community forums. These resources can provide valuable insights and support, helping you make the most of this powerful extension. Visit the Qwen3-ASR GitHub repository for more information and to connect with other users and developers.

ComfyUI-Qwen3-ASR Related Nodes

RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.