ComfyUI > Nodes > MW-ComfyUI_EraX-WoW-Turbo > Whisper Turbo Run

ComfyUI Node: Whisper Turbo Run

Class Name

WhisperTurboRun

Category
🎤MW/MW-EraXWoW
Author
mw (Account age: 2475days)
Extension
MW-ComfyUI_EraX-WoW-Turbo
Latest Updated
2025-05-23
Github Stars
0.01K

How to Install MW-ComfyUI_EraX-WoW-Turbo

Install this extension via the ComfyUI Manager by searching for MW-ComfyUI_EraX-WoW-Turbo
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter MW-ComfyUI_EraX-WoW-Turbo in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Whisper Turbo Run Description

Sophisticated node for converting audio to text with advanced machine learning, leveraging Whisper model for efficient transcription.

Whisper Turbo Run:

WhisperTurboRun is a sophisticated node designed to facilitate the transcription of audio data into text using advanced machine learning models. This node leverages the capabilities of the Whisper model, which is renowned for its efficiency and accuracy in processing audio inputs. The primary goal of WhisperTurboRun is to convert spoken language into written text, making it an invaluable tool for applications that require speech-to-text conversion. It is particularly beneficial for users who need to transcribe audio content quickly and accurately, such as in the creation of subtitles or transcription of spoken content for accessibility purposes. The node is designed to handle various audio formats and can adjust the sample rate of the input audio to ensure compatibility with the model's requirements. By providing options for initial prompts and handling of timestamps, WhisperTurboRun offers flexibility and control over the transcription process, allowing users to tailor the output to their specific needs.

Whisper Turbo Run Input Parameters:

audio

The audio parameter is a dictionary containing the waveform and sample rate of the audio input. It is crucial for the transcription process as it provides the raw audio data that the model will convert into text. The waveform should be a tensor, and the sample rate should ideally be 16000 Hz for optimal performance. If the sample rate differs, the node will automatically resample the audio to meet this requirement.

logprob_threshold

The logprob_threshold parameter sets the threshold for the log probability of the transcribed text. It helps in filtering out low-confidence transcriptions, ensuring that only text with a higher likelihood of accuracy is retained. The default value is -1.0, which means no filtering is applied unless specified otherwise.

no_speech_threshold

The no_speech_threshold parameter determines the threshold for detecting silence or non-speech segments in the audio. A lower value will make the model more sensitive to detecting speech, while a higher value will allow more silence to be considered as potential speech. The default value is 0.1.

initial_prompt

The initial_prompt parameter allows you to provide a starting text or context for the transcription. This can be useful in guiding the model to understand the context better, especially in cases where the audio might be ambiguous or unclear. The default is an empty string, meaning no initial prompt is provided.

unload_model

The unload_model parameter is a boolean that determines whether the model should be unloaded from memory after the transcription is complete. This can be useful for freeing up resources, especially when working with limited memory. The default value is False, meaning the model remains loaded.

timestamp

The timestamp parameter is a boolean that indicates whether timestamps should be included in the transcription output. When enabled, the transcription will include time markers for each segment of text, which is useful for applications like subtitle generation. The default value is False.

Whisper Turbo Run Output Parameters:

result

The result parameter is a dictionary containing the transcribed text and, optionally, the timestamps for each segment. This output is the primary deliverable of the node, providing the converted text from the input audio. The inclusion of timestamps depends on the timestamp input parameter, offering flexibility in how the transcription is utilized.

Whisper Turbo Run Usage Tips:

  • Ensure your audio input is clear and free from excessive background noise to improve transcription accuracy.
  • Use the initial_prompt parameter to provide context if the audio content is complex or contains specialized terminology.
  • Adjust the no_speech_threshold to fine-tune the sensitivity of speech detection, especially in environments with varying levels of background noise.
  • Consider enabling the unload_model option if you are processing multiple files sequentially and need to manage memory usage efficiently.

Whisper Turbo Run Common Errors and Solutions:

"Sample rate mismatch"

  • Explanation: The audio sample rate does not match the required 16000 Hz.
  • Solution: Ensure your audio input is at 16000 Hz or let the node automatically resample it.

"Model not loaded"

  • Explanation: The model was not loaded into memory, possibly due to an initialization error.
  • Solution: Check the model path and ensure the model files are correctly placed and accessible.

"Low confidence transcription"

  • Explanation: The transcribed text has a low log probability, indicating potential inaccuracies.
  • Solution: Adjust the logprob_threshold to filter out low-confidence segments or improve the audio quality.

Whisper Turbo Run Related Nodes

Go back to the extension to check out more related nodes.
MW-ComfyUI_EraX-WoW-Turbo
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.