Save 4 hours! We auto-setup your workflow! Free!

Drop your workflow.json — we handle every dependency, custom node, and model. Just open the link and run.

Auto-Setup Workflow Json (Free) Now!
ComfyUI > Nodes > Dots-TTS-ComfyUI > Dots TTS Whisper Transcribe

ComfyUI Node: Dots TTS Whisper Transcribe

Class Name

DotsTTSWhisperTranscribe

Category
Dots TTS
Author
Saganaki22 (Account age: 1867days)
Extension
Dots-TTS-ComfyUI
Latest Updated
2026-06-23
Github Stars
0.03K

How to Install Dots-TTS-ComfyUI

Install this extension via the ComfyUI Manager by searching for Dots-TTS-ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter Dots-TTS-ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Dots TTS Whisper Transcribe Description

Transcribes audio to text using Whisper model for Dots TTS voice cloning with high accuracy and language detection.

Dots TTS Whisper Transcribe:

The DotsTTSWhisperTranscribe node is designed to transcribe audio into text using the Whisper model, specifically for Dots TTS voice cloning applications. This node leverages advanced automatic speech recognition (ASR) capabilities to convert spoken language from audio files into written text, which can then be used as reference material for voice cloning tasks. By utilizing Whisper, a state-of-the-art ASR model, this node ensures high accuracy and efficiency in transcription, making it an invaluable tool for AI artists looking to create precise and reliable voice clones. The node is capable of handling various languages and can automatically detect the language of the audio, further enhancing its versatility and ease of use.

Dots TTS Whisper Transcribe Input Parameters:

audio

This parameter represents the reference audio that you want to transcribe. The audio is provided as a dictionary, and it serves as the primary input for the transcription process. The quality and clarity of the audio can significantly impact the accuracy of the transcription.

model

This parameter specifies the Whisper ASR model to be used for transcription. You can choose from several models, such as whisper-large-v3-turbo, whisper-medium, or whisper-small, among others. The default model is whisper-large-v3-turbo, which is known for its speed and accuracy. Selecting the appropriate model can affect the transcription speed and accuracy.

dtype

This parameter determines the precision of the Whisper model during transcription. Options include auto, bf16, and fp32. The default setting is auto, which automatically selects bf16 on supported CUDA/XPU devices and fp32 otherwise. The choice of precision can influence the performance and resource usage of the transcription process.

language

This parameter indicates the language of the reference audio. Options include auto, english, chinese, japanese, and several others. The default is auto, which allows the model to automatically detect the language. Specifying the language can improve transcription accuracy, especially for non-English audio.

task

This parameter defines the task to be performed by the Whisper model. Options are transcribe and translate. The default is transcribe, which retains the original language of the audio. Choosing translate will output the transcription in English, regardless of the original language.

chunk_length_s

This parameter sets the length of audio chunks to be processed at a time, measured in seconds. The default value is 30 seconds, with a minimum of 0 and a maximum of 120 seconds. Setting this to 0 allows the model to automatically determine the chunk length. Adjusting this parameter can help manage memory usage and processing time for longer audio files.

download_if_missing

This boolean parameter determines whether the Whisper model should be automatically downloaded if it is not already available. Setting this to True ensures that the necessary model files are retrieved, facilitating seamless transcription without manual intervention.

Dots TTS Whisper Transcribe Output Parameters:

transcript

The output of this node is a string containing the transcribed text from the reference audio. This transcript serves as a crucial component for voice cloning tasks, providing the textual reference needed to replicate the original voice accurately. The quality of the transcript can significantly impact the effectiveness of the voice cloning process.

Dots TTS Whisper Transcribe Usage Tips:

  • Ensure that the reference audio is clear and free from background noise to improve transcription accuracy.
  • Select the appropriate Whisper model based on your needs for speed and accuracy; larger models may offer better accuracy but require more computational resources.
  • If you are working with non-English audio, specify the language to enhance transcription precision.
  • Adjust the chunk_length_s parameter to optimize performance for longer audio files, balancing between memory usage and processing time.

Dots TTS Whisper Transcribe Common Errors and Solutions:

Model not found

  • Explanation: This error occurs when the specified Whisper model is not available locally.
  • Solution: Ensure that download_if_missing is set to True to automatically download the required model.

Unsupported language

  • Explanation: The language specified is not supported by the Whisper model.
  • Solution: Verify that the language is included in the WHISPER_LANGUAGE_OPTIONS and adjust the parameter accordingly.

Audio format error

  • Explanation: The provided audio file is in an unsupported format or is corrupted.
  • Solution: Ensure the audio file is in a compatible format and is not corrupted before attempting transcription.

Dots TTS Whisper Transcribe Related Nodes

Go back to the extension to check out more related nodes.
Dots-TTS-ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Dots TTS Whisper Transcribe