ComfyUI > Nodes > ComfyUI-TranscriptionTools > Whisper Transcribe

ComfyUI Node: Whisper Transcribe

Class Name

TT-WhisperTranscription

Category
transcription
Author
royceschultz (Account age: 2853days)
Extension
ComfyUI-TranscriptionTools
Latest Updated
2025-04-23
Github Stars
0.02K

How to Install ComfyUI-TranscriptionTools

Install this extension via the ComfyUI Manager by searching for ComfyUI-TranscriptionTools
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-TranscriptionTools in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Whisper Transcribe Description

Facilitates audio-to-text transcription using advanced machine learning models for AI artists and developers, with customizable output formats.

Whisper Transcribe:

The TT-WhisperTranscription node is designed to facilitate the transcription of audio files into text using advanced machine learning models. This node is particularly beneficial for AI artists and developers who need to convert spoken content into written form efficiently. It leverages a transcription pipeline to process audio data, providing a seamless way to generate text from audio inputs. The node is capable of handling various audio formats and can output transcriptions in a structured manner, making it easier to analyze and utilize the transcribed data. Its primary goal is to simplify the transcription process, offering options to format the output text and save it in different formats, which enhances its usability in diverse applications.

Whisper Transcribe Input Parameters:

pipeline

The pipeline parameter is essential as it specifies the transcription pipeline to be used for processing the audio data. This parameter determines the model and method applied to convert audio into text, impacting the accuracy and quality of the transcription. It is a required parameter and must be set to a valid transcription pipeline.

wav_bytes

The wav_bytes parameter represents the audio data in byte format that needs to be transcribed. This input is crucial as it provides the raw audio content that the node will process. The quality and format of the audio data can affect the transcription results, so it is important to ensure that the audio is clear and in a compatible format.

format_newlines_on_punctuation

This boolean parameter, format_newlines_on_punctuation, controls whether newlines should be inserted after punctuation marks in the transcribed text. By default, it is set to True, which helps in making the text more readable by breaking it into sentences. This option is particularly useful for creating structured and easy-to-read transcriptions.

save_transcription

The save_transcription parameter is a boolean that determines whether the transcribed text should be saved to a file. By default, it is set to False. When enabled, the transcription is saved as a text file, allowing for easy access and storage of the transcribed content for future reference or analysis.

save_chunks

The save_chunks parameter, also a boolean, specifies whether the transcription should be saved in chunks, with each chunk representing a segment of the audio. This is useful for applications that require detailed analysis of specific parts of the audio. By default, it is set to False.

save_filename

The save_filename parameter is a string that defines the base name for the saved transcription files. The default value is 'transcription'. This parameter allows you to customize the naming of the output files, which can be helpful for organizing and identifying transcriptions.

overwrite_existing

The overwrite_existing boolean parameter determines whether existing transcription files should be overwritten. By default, it is set to True, allowing new transcriptions to replace old ones. If set to False, the node will create new files with incremented names to avoid overwriting.

Whisper Transcribe Output Parameters:

transcription

The transcription output is a string that contains the full text transcribed from the audio input. This output is the primary result of the node's processing and provides a readable version of the spoken content, which can be used for various applications such as documentation, analysis, or further processing.

chunks

The chunks output is a string that represents the transcribed text divided into segments or chunks, each with associated timestamps. This output is particularly useful for applications that require time-aligned text, such as video subtitling or detailed audio analysis, as it allows you to pinpoint specific parts of the audio.

Whisper Transcribe Usage Tips:

  • Ensure that the audio input provided in wav_bytes is clear and in a compatible format to achieve the best transcription results.
  • Use the format_newlines_on_punctuation option to enhance the readability of the transcribed text, especially if the output will be used for documentation or presentation purposes.
  • Consider enabling save_chunks if you need detailed, time-aligned transcriptions for applications like video subtitling or audio analysis.
  • Customize the save_filename to organize your transcriptions effectively, especially when dealing with multiple files or projects.

Whisper Transcribe Common Errors and Solutions:

Output directory does not exist

  • Explanation: This error occurs when the specified output directory for saving transcriptions does not exist.
  • Solution: Ensure that the output directory path is correct and that the directory exists. You can create the directory manually or modify the node to create it automatically.

Some files failed to transcribe

  • Explanation: This error indicates that one or more audio files in a batch failed to be transcribed, possibly due to incompatible formats or corrupted data.
  • Solution: Check the format and integrity of the audio files. Ensure they are in a supported format and not corrupted. Retry the transcription process with corrected files.

Whisper Transcribe Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-TranscriptionTools
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.