ComfyUI Node: Shrug Speech-to-Text (ASR)

Class Name

ShrugASRNode

Category
Shrug Nodes/Audio
Author
fblissjr (Account age: 4014days)
Extension
Shrug-Prompter: Unified VLM Integration for ComfyUI
Latest Updated
2025-09-30
Github Stars
0.02K

How to Install Shrug-Prompter: Unified VLM Integration for ComfyUI

Install this extension via the ComfyUI Manager by searching for Shrug-Prompter: Unified VLM Integration for ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter Shrug-Prompter: Unified VLM Integration for ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Shrug Speech-to-Text (ASR) Description

ShrugASRNode converts audio to text using ASR technology for efficient speech-to-text tasks.

Shrug Speech-to-Text (ASR):

The ShrugASRNode is a specialized component designed to facilitate the conversion of audio files into text using Automatic Speech Recognition (ASR) technology. This node is particularly beneficial for users who need to transcribe spoken content into written form efficiently and accurately. By leveraging a specified ASR model, the node processes audio input and returns the corresponding transcribed text. This functionality is crucial for applications that require speech-to-text conversion, such as creating subtitles, transcribing interviews, or processing audio data for further analysis. The node's integration into the Shrug Nodes/Audio category highlights its role in enhancing audio processing capabilities within the ComfyUI framework.

Shrug Speech-to-Text (ASR) Input Parameters:

context

The context parameter is a dictionary that provides essential configuration details required for the node's operation. It includes the provider_config, which contains the base_url and the llm_model (ASR model ID). These configurations are crucial as they define the endpoint and the specific model to be used for transcription. The context parameter ensures that the node can communicate with the appropriate ASR service and utilize the correct model for accurate transcription results. There are no specific minimum, maximum, or default values for this parameter, but it must include valid configuration details.

audio_path

The audio_path parameter is a string that specifies the file path to the audio file that needs to be transcribed. This parameter is mandatory and must be provided by the user, as it directly impacts the node's ability to process and transcribe the audio content. The audio_path should point to a valid audio file accessible by the system, and it is crucial for the successful execution of the transcription process. There are no specific minimum, maximum, or default values for this parameter, but it must be a valid file path.

Shrug Speech-to-Text (ASR) Output Parameters:

transcribed_text

The transcribed_text parameter is a string that represents the output of the node, containing the text transcribed from the provided audio file. This output is the primary result of the node's operation and is essential for users who need a textual representation of spoken content. The transcribed_text allows for easy reading, editing, and further processing of the audio content, making it a valuable asset for various applications that require speech-to-text conversion.

Shrug Speech-to-Text (ASR) Usage Tips:

  • Ensure that the context parameter includes a valid provider_config with both base_url and llm_model specified to avoid errors during transcription.
  • Verify that the audio_path points to a valid and accessible audio file to ensure successful transcription and avoid file-related errors.

Shrug Speech-to-Text (ASR) Common Errors and Solutions:

Provider config with base_url and model is required.

  • Explanation: This error occurs when the context parameter does not include a valid provider_config with the necessary base_url and llm_model.
  • Solution: Check the context parameter to ensure that it contains a valid provider_config with both base_url and llm_model specified.

File not found or inaccessible

  • Explanation: This error arises when the audio_path does not point to a valid or accessible audio file.
  • Solution: Verify that the audio_path is correct and that the file is accessible by the system. Ensure that the file path is accurate and that the file exists.

Shrug Speech-to-Text (ASR) Related Nodes

Go back to the extension to check out more related nodes.
Shrug-Prompter: Unified VLM Integration for ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Shrug Speech-to-Text (ASR)