Visit ComfyUI Online for ready-to-use ComfyUI environment
Sophisticated node for converting audio to text with advanced machine learning, leveraging Whisper model for efficient transcription.
WhisperTurboRun is a sophisticated node designed to facilitate the transcription of audio data into text using advanced machine learning models. This node leverages the capabilities of the Whisper model, which is renowned for its efficiency and accuracy in processing audio inputs. The primary goal of WhisperTurboRun is to convert spoken language into written text, making it an invaluable tool for applications that require speech-to-text conversion. It is particularly beneficial for users who need to transcribe audio content quickly and accurately, such as in the creation of subtitles or transcription of spoken content for accessibility purposes. The node is designed to handle various audio formats and can adjust the sample rate of the input audio to ensure compatibility with the model's requirements. By providing options for initial prompts and handling of timestamps, WhisperTurboRun offers flexibility and control over the transcription process, allowing users to tailor the output to their specific needs.
The audio parameter is a dictionary containing the waveform and sample rate of the audio input. It is crucial for the transcription process as it provides the raw audio data that the model will convert into text. The waveform should be a tensor, and the sample rate should ideally be 16000 Hz for optimal performance. If the sample rate differs, the node will automatically resample the audio to meet this requirement.
The logprob_threshold parameter sets the threshold for the log probability of the transcribed text. It helps in filtering out low-confidence transcriptions, ensuring that only text with a higher likelihood of accuracy is retained. The default value is -1.0, which means no filtering is applied unless specified otherwise.
The no_speech_threshold parameter determines the threshold for detecting silence or non-speech segments in the audio. A lower value will make the model more sensitive to detecting speech, while a higher value will allow more silence to be considered as potential speech. The default value is 0.1.
The initial_prompt parameter allows you to provide a starting text or context for the transcription. This can be useful in guiding the model to understand the context better, especially in cases where the audio might be ambiguous or unclear. The default is an empty string, meaning no initial prompt is provided.
The unload_model parameter is a boolean that determines whether the model should be unloaded from memory after the transcription is complete. This can be useful for freeing up resources, especially when working with limited memory. The default value is False, meaning the model remains loaded.
The timestamp parameter is a boolean that indicates whether timestamps should be included in the transcription output. When enabled, the transcription will include time markers for each segment of text, which is useful for applications like subtitle generation. The default value is False.
The result parameter is a dictionary containing the transcribed text and, optionally, the timestamps for each segment. This output is the primary deliverable of the node, providing the converted text from the input audio. The inclusion of timestamps depends on the timestamp input parameter, offering flexibility in how the transcription is utilized.
initial_prompt parameter to provide context if the audio content is complex or contains specialized terminology.no_speech_threshold to fine-tune the sensitivity of speech detection, especially in environments with varying levels of background noise.unload_model option if you are processing multiple files sequentially and need to manage memory usage efficiently.logprob_threshold to filter out low-confidence segments or improve the audio quality.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.