ComfyUI > Nodes > ComfyUI_ASR > 自动语音识别

ComfyUI Node: 自动语音识别

Class Name

ASRMW

Category
🎤MW/MW-ASR
Author
billwuhao (Account age: 2576days)
Extension
ComfyUI_ASR
Latest Updated
2026-03-11
Github Stars
0.03K

How to Install ComfyUI_ASR

Install this extension via the ComfyUI Manager by searching for ComfyUI_ASR
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_ASR in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

自动语音识别 Description

ASRMW converts spoken language to text, supporting multiple models and word-level timestamps.

自动语音识别:

ASRMW is a node designed for automatic speech recognition (ASR), which is a technology that converts spoken language into text. This node is particularly useful for processing audio files and generating transcriptions, making it an essential tool for tasks that require converting spoken content into written form. ASRMW supports multiple models, allowing you to choose the one that best fits your needs, whether it's for recognizing different languages or handling specific audio qualities. The node is capable of providing detailed transcriptions with word-level timestamps, which can be beneficial for creating subtitles or analyzing speech patterns. By leveraging advanced ASR models, ASRMW aims to deliver accurate and efficient speech-to-text conversion, enhancing your ability to work with audio data in various creative and analytical projects.

自动语音识别 Input Parameters:

audio_file

The audio_file parameter is the input audio file that you want to transcribe. This file should be in a supported audio format, such as WAV or MP3, and it serves as the primary source of spoken content for the ASR process. The quality and clarity of the audio file can significantly impact the accuracy of the transcription, so it's important to use a file with clear speech and minimal background noise.

model_path

The model_path parameter specifies the directory path where the ASR model files are stored. This path is crucial for loading the appropriate model that will be used for transcribing the audio. The model files include configurations and data necessary for the ASR process, and ensuring the correct path is set will help in avoiding errors related to model loading.

device

The device parameter determines the hardware on which the ASR model will run, such as a CPU or GPU. Selecting the appropriate device can affect the speed and efficiency of the transcription process. For instance, using a GPU can significantly accelerate the processing time compared to a CPU, especially for large audio files or complex models.

每句最大长度 (max_sentence_length)

The 每句最大长度 parameter, translated as max_sentence_length, defines the maximum length of each sentence in the transcription. This setting helps in controlling the segmentation of the transcribed text, ensuring that sentences are not too long and are easier to read and understand. Adjusting this parameter can help in tailoring the output to specific requirements, such as subtitle creation or detailed text analysis.

自动语音识别 Output Parameters:

transcribed_text

The transcribed_text parameter is the main output of the ASRMW node, providing the complete transcription of the input audio file. This text represents the spoken content converted into written form, and it can be used for various purposes, such as creating subtitles, conducting text analysis, or simply documenting spoken information.

word_timestamps

The word_timestamps parameter provides a list of timestamps for each word in the transcribed text. These timestamps indicate the start and end times of each word in the audio file, allowing for precise alignment of text with the audio. This output is particularly useful for applications that require synchronization between audio and text, such as video subtitling or detailed speech analysis.

自动语音识别 Usage Tips:

  • Ensure that your audio files are of high quality with minimal background noise to improve transcription accuracy.
  • Choose the appropriate ASR model based on the language and characteristics of your audio content to achieve the best results.
  • Utilize the device parameter to leverage GPU acceleration if available, as this can significantly speed up the transcription process.
  • Adjust the max_sentence_length parameter to control the segmentation of the transcribed text, making it more suitable for your specific use case.

自动语音识别 Common Errors and Solutions:

Model file not found: <model_asr>. Please check paths.

  • Explanation: This error occurs when the specified model files cannot be found at the given model_path.
  • Solution: Verify that the model_path is correct and that all necessary model files are present in the specified directory. Ensure that the path is accessible and that there are no typos or missing files.

Audio file format not supported

  • Explanation: This error indicates that the provided audio file is in a format that is not supported by the ASRMW node.
  • Solution: Convert your audio file to a supported format, such as WAV or MP3, and try again. Ensure that the audio file is not corrupted and is properly formatted.

Device not available

  • Explanation: This error occurs when the specified device is not available for running the ASR model.
  • Solution: Check your system's hardware configuration to ensure that the specified device (CPU or GPU) is available and properly configured. If using a GPU, ensure that the necessary drivers and libraries are installed.

自动语音识别 Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_ASR
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.