RunComfy

Wan2.2 VACE Fun | Image to Animated Video

Turn still photos into lifelike animated videos with custom prompts.

FlashVSR | Real-Time Video Upscaler

Upscale videos fast, smooth, and super clear—no detail lost.

IPAdapter Plus (V2) + ControlNet | Image to Video

Convert images to animations with ComfyUI IPAdapter Plus and ControlNet QRCode.

FLUX.2 [klein] 4B & 9B | Ultra-Fast Flux Image Generator

Blazing-fast visual creation with unified editing control.

ComfyUI > Nodes > Shrug-Prompter: Unified VLM Integration for ComfyUI > Shrug Speech-to-Text (ASR)

ComfyUI Node: Shrug Speech-to-Text (ASR)

Class Name

ShrugASRNode

Category
Shrug Nodes/Audio

Author
fblissjr (Account age: 4014days) Extension
Shrug-Prompter: Unified VLM Integration for ComfyUI Latest Updated
2025-09-30 Github Stars
0.02K

Github Ask fblissjr Current Questions Past Questions

Table of Content

Description
ShrugASRNode:
ShrugASRNode Input Parameters:
ShrugASRNode Output Parameters:
ShrugASRNode Usage Tips:
ShrugASRNode Common Errors and Solutions:
Related Nodes

How to Install Shrug-Prompter: Unified VLM Integration for ComfyUI

Install this extension via the ComfyUI Manager by searching for Shrug-Prompter: Unified VLM Integration for ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter Shrug-Prompter: Unified VLM Integration for ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Shrug Speech-to-Text (ASR) Description

ShrugASRNode converts audio to text using ASR technology for efficient speech-to-text tasks.

Shrug Speech-to-Text (ASR):

The ShrugASRNode is a specialized component designed to facilitate the conversion of audio files into text using Automatic Speech Recognition (ASR) technology. This node is particularly beneficial for users who need to transcribe spoken content into written form efficiently and accurately. By leveraging a specified ASR model, the node processes audio input and returns the corresponding transcribed text. This functionality is crucial for applications that require speech-to-text conversion, such as creating subtitles, transcribing interviews, or processing audio data for further analysis. The node's integration into the Shrug Nodes/Audio category highlights its role in enhancing audio processing capabilities within the ComfyUI framework.

Shrug Speech-to-Text (ASR) Input Parameters:

context

The context parameter is a dictionary that provides essential configuration details required for the node's operation. It includes the provider_config, which contains the base_url and the llm_model (ASR model ID). These configurations are crucial as they define the endpoint and the specific model to be used for transcription. The context parameter ensures that the node can communicate with the appropriate ASR service and utilize the correct model for accurate transcription results. There are no specific minimum, maximum, or default values for this parameter, but it must include valid configuration details.

audio_path

The audio_path parameter is a string that specifies the file path to the audio file that needs to be transcribed. This parameter is mandatory and must be provided by the user, as it directly impacts the node's ability to process and transcribe the audio content. The audio_path should point to a valid audio file accessible by the system, and it is crucial for the successful execution of the transcription process. There are no specific minimum, maximum, or default values for this parameter, but it must be a valid file path.

Shrug Speech-to-Text (ASR) Output Parameters:

transcribed_text

The transcribed_text parameter is a string that represents the output of the node, containing the text transcribed from the provided audio file. This output is the primary result of the node's operation and is essential for users who need a textual representation of spoken content. The transcribed_text allows for easy reading, editing, and further processing of the audio content, making it a valuable asset for various applications that require speech-to-text conversion.

Shrug Speech-to-Text (ASR) Usage Tips:

Ensure that the context parameter includes a valid provider_config with both base_url and llm_model specified to avoid errors during transcription.
Verify that the audio_path points to a valid and accessible audio file to ensure successful transcription and avoid file-related errors.

Shrug Speech-to-Text (ASR) Common Errors and Solutions:

Provider config with base_url and model is required.

Explanation: This error occurs when the context parameter does not include a valid provider_config with the necessary base_url and llm_model.
Solution: Check the context parameter to ensure that it contains a valid provider_config with both base_url and llm_model specified.

File not found or inaccessible

Explanation: This error arises when the audio_path does not point to a valid or accessible audio file.
Solution: Verify that the audio_path is correct and that the file is accessible by the system. Ensure that the file path is accurate and that the file exists.

Shrug Speech-to-Text (ASR) Related Nodes

Go back to the extension to check out more related nodes.

Shrug-Prompter: Unified VLM Integration for ComfyUI

Table of Content

Description
ShrugASRNode:
ShrugASRNode Input Parameters:
ShrugASRNode Output Parameters:
ShrugASRNode Usage Tips:
ShrugASRNode Common Errors and Solutions:
Related Nodes

SCAIL Model | Pose-Guided Animation Maker

Pose-driven animation with identity stability and motion precision.

Hunyuan Video 1.5 | Fast AI Video Generator

Turn text or images into smooth 1080p videos quickly and easily.

LTX-2 First Last Frame | Key Frames Video Generator

Turn still frames into seamless video and sound transitions fast.

FLUX Dev ControlNet | Multi-Condition ControlNet

Controlled FLUX Dev image generation with Pose, Depth, Canny, and ReColor

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: Shrug Speech-to-Text (ASR)

ShrugASRNode

How to Install Shrug-Prompter: Unified VLM Integration for ComfyUI

Shrug Speech-to-Text (ASR) Description

Shrug Speech-to-Text (ASR):

Shrug Speech-to-Text (ASR) Input Parameters:

context

audio_path

Shrug Speech-to-Text (ASR) Output Parameters:

transcribed_text

Shrug Speech-to-Text (ASR) Usage Tips:

Shrug Speech-to-Text (ASR) Common Errors and Solutions:

Provider config with base_url and model is required.

File not found or inaccessible

Shrug Speech-to-Text (ASR) Related Nodes