RunComfy

Wan 2.2 Animate | Character Swap & Lip-Sync

Transforms any face to speak and move like the original with ease.

ComfyUI F5 TTS | Natural Voice Cloning Engine

Turn text into rich, expressive voices with natural tone control.

Flux Krea Dev | Natural Text to Image

The best open-source FLUX model! Absolutely incredible natural results.

FLUX.2 [klein] 4B & 9B | Ultra-Fast Flux Image Generator

Blazing-fast visual creation with unified editing control.

ComfyUI > Nodes > MW-ComfyUI_EraX-WoW-Turbo > Whisper Turbo Run

ComfyUI Node: Whisper Turbo Run

Class Name

WhisperTurboRun

Category
🎤MW/MW-EraXWoW

Author
mw (Account age: 2475days) Extension
MW-ComfyUI_EraX-WoW-Turbo Latest Updated
2025-05-23 Github Stars
0.01K

Github Ask mw Current Questions Past Questions

Table of Content

Description
WhisperTurboRun:
WhisperTurboRun Input Parameters:
WhisperTurboRun Output Parameters:
WhisperTurboRun Usage Tips:
WhisperTurboRun Common Errors and Solutions:
Related Nodes

How to Install MW-ComfyUI_EraX-WoW-Turbo

Install this extension via the ComfyUI Manager by searching for MW-ComfyUI_EraX-WoW-Turbo

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter MW-ComfyUI_EraX-WoW-Turbo in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Whisper Turbo Run Description

Sophisticated node for converting audio to text with advanced machine learning, leveraging Whisper model for efficient transcription.

Whisper Turbo Run:

WhisperTurboRun is a sophisticated node designed to facilitate the transcription of audio data into text using advanced machine learning models. This node leverages the capabilities of the Whisper model, which is renowned for its efficiency and accuracy in processing audio inputs. The primary goal of WhisperTurboRun is to convert spoken language into written text, making it an invaluable tool for applications that require speech-to-text conversion. It is particularly beneficial for users who need to transcribe audio content quickly and accurately, such as in the creation of subtitles or transcription of spoken content for accessibility purposes. The node is designed to handle various audio formats and can adjust the sample rate of the input audio to ensure compatibility with the model's requirements. By providing options for initial prompts and handling of timestamps, WhisperTurboRun offers flexibility and control over the transcription process, allowing users to tailor the output to their specific needs.

Whisper Turbo Run Input Parameters:

audio

The audio parameter is a dictionary containing the waveform and sample rate of the audio input. It is crucial for the transcription process as it provides the raw audio data that the model will convert into text. The waveform should be a tensor, and the sample rate should ideally be 16000 Hz for optimal performance. If the sample rate differs, the node will automatically resample the audio to meet this requirement.

logprob_threshold

The logprob_threshold parameter sets the threshold for the log probability of the transcribed text. It helps in filtering out low-confidence transcriptions, ensuring that only text with a higher likelihood of accuracy is retained. The default value is -1.0, which means no filtering is applied unless specified otherwise.

no_speech_threshold

The no_speech_threshold parameter determines the threshold for detecting silence or non-speech segments in the audio. A lower value will make the model more sensitive to detecting speech, while a higher value will allow more silence to be considered as potential speech. The default value is 0.1.

initial_prompt

The initial_prompt parameter allows you to provide a starting text or context for the transcription. This can be useful in guiding the model to understand the context better, especially in cases where the audio might be ambiguous or unclear. The default is an empty string, meaning no initial prompt is provided.

unload_model

The unload_model parameter is a boolean that determines whether the model should be unloaded from memory after the transcription is complete. This can be useful for freeing up resources, especially when working with limited memory. The default value is False, meaning the model remains loaded.

timestamp

The timestamp parameter is a boolean that indicates whether timestamps should be included in the transcription output. When enabled, the transcription will include time markers for each segment of text, which is useful for applications like subtitle generation. The default value is False.

Whisper Turbo Run Output Parameters:

result

The result parameter is a dictionary containing the transcribed text and, optionally, the timestamps for each segment. This output is the primary deliverable of the node, providing the converted text from the input audio. The inclusion of timestamps depends on the timestamp input parameter, offering flexibility in how the transcription is utilized.

Whisper Turbo Run Usage Tips:

Ensure your audio input is clear and free from excessive background noise to improve transcription accuracy.
Use the initial_prompt parameter to provide context if the audio content is complex or contains specialized terminology.
Adjust the no_speech_threshold to fine-tune the sensitivity of speech detection, especially in environments with varying levels of background noise.
Consider enabling the unload_model option if you are processing multiple files sequentially and need to manage memory usage efficiently.

Whisper Turbo Run Common Errors and Solutions:

"Sample rate mismatch"

Explanation: The audio sample rate does not match the required 16000 Hz.
Solution: Ensure your audio input is at 16000 Hz or let the node automatically resample it.

"Model not loaded"

Explanation: The model was not loaded into memory, possibly due to an initialization error.
Solution: Check the model path and ensure the model files are correctly placed and accessible.

"Low confidence transcription"

Explanation: The transcribed text has a low log probability, indicating potential inaccuracies.
Solution: Adjust the logprob_threshold to filter out low-confidence segments or improve the audio quality.

Whisper Turbo Run Related Nodes

Go back to the extension to check out more related nodes.

MW-ComfyUI_EraX-WoW-Turbo

Table of Content

Description
WhisperTurboRun:
WhisperTurboRun Input Parameters:
WhisperTurboRun Output Parameters:
WhisperTurboRun Usage Tips:
WhisperTurboRun Common Errors and Solutions:
Related Nodes

Consistent Character Creator 3.0 | Easy Consistency, Any Angle

Make characters stay the same, every angle, strong and perfect.

Reallusion AI Render | 3D to ComfyUI Workflows Collection

ComfyUI + Reallusion = Speed, Accessibility, and Ease for 3D visuals

Wan 2.1 Ditto | Cinematic Video Restyle Generator

Transform videos into stunning artistic styles with perfect motion flow.

Wan 2.2 VACE | Pose-Controlled Video Generator

Turn still images into stunning motion with pose-based control.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy