RunComfy

Wan 2.2 Video Restyle | First Frame Restyle for Consistent and Cinematic Video Generation

Change the first frame, folks, your style makes the whole video look amazing. Pure magic.

Z-Image Turbo I2I for Characters | Ultimate Photorealism

Turns portraits into lifelike, perfectly detailed realistic faces fast.

ACE++ Face Swap ｜ Image Editing

Swap faces in images with natural language instructions while preserving style and context.

Wan 2.1 Fun | I2V + T2V

Empower your AI videos with Wan 2.1 Fun.

ComfyUI > Nodes > Dots-TTS-ComfyUI > Dots TTS Whisper Transcribe

ComfyUI Node: Dots TTS Whisper Transcribe

Class Name

DotsTTSWhisperTranscribe

Category
Dots TTS

Author
Saganaki22 (Account age: 1867days) Extension
Dots-TTS-ComfyUI Latest Updated
2026-06-23 Github Stars
0.03K

Github Ask Saganaki22 Current Questions Past Questions

Table of Content

Description
DotsTTSWhisperTranscribe:
DotsTTSWhisperTranscribe Input Parameters:
DotsTTSWhisperTranscribe Output Parameters:
DotsTTSWhisperTranscribe Usage Tips:
DotsTTSWhisperTranscribe Common Errors and Solutions:
Related Nodes

How to Install Dots-TTS-ComfyUI

Install this extension via the ComfyUI Manager by searching for Dots-TTS-ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter Dots-TTS-ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Dots TTS Whisper Transcribe Description

Transcribes audio to text using Whisper model for Dots TTS voice cloning with high accuracy and language detection.

Dots TTS Whisper Transcribe:

The DotsTTSWhisperTranscribe node is designed to transcribe audio into text using the Whisper model, specifically for Dots TTS voice cloning applications. This node leverages advanced automatic speech recognition (ASR) capabilities to convert spoken language from audio files into written text, which can then be used as reference material for voice cloning tasks. By utilizing Whisper, a state-of-the-art ASR model, this node ensures high accuracy and efficiency in transcription, making it an invaluable tool for AI artists looking to create precise and reliable voice clones. The node is capable of handling various languages and can automatically detect the language of the audio, further enhancing its versatility and ease of use.

Dots TTS Whisper Transcribe Input Parameters:

audio

This parameter represents the reference audio that you want to transcribe. The audio is provided as a dictionary, and it serves as the primary input for the transcription process. The quality and clarity of the audio can significantly impact the accuracy of the transcription.

model

This parameter specifies the Whisper ASR model to be used for transcription. You can choose from several models, such as whisper-large-v3-turbo, whisper-medium, or whisper-small, among others. The default model is whisper-large-v3-turbo, which is known for its speed and accuracy. Selecting the appropriate model can affect the transcription speed and accuracy.

dtype

This parameter determines the precision of the Whisper model during transcription. Options include auto, bf16, and fp32. The default setting is auto, which automatically selects bf16 on supported CUDA/XPU devices and fp32 otherwise. The choice of precision can influence the performance and resource usage of the transcription process.

language

This parameter indicates the language of the reference audio. Options include auto, english, chinese, japanese, and several others. The default is auto, which allows the model to automatically detect the language. Specifying the language can improve transcription accuracy, especially for non-English audio.

task

This parameter defines the task to be performed by the Whisper model. Options are transcribe and translate. The default is transcribe, which retains the original language of the audio. Choosing translate will output the transcription in English, regardless of the original language.

chunk_length_s

This parameter sets the length of audio chunks to be processed at a time, measured in seconds. The default value is 30 seconds, with a minimum of 0 and a maximum of 120 seconds. Setting this to 0 allows the model to automatically determine the chunk length. Adjusting this parameter can help manage memory usage and processing time for longer audio files.

download_if_missing

This boolean parameter determines whether the Whisper model should be automatically downloaded if it is not already available. Setting this to True ensures that the necessary model files are retrieved, facilitating seamless transcription without manual intervention.

Dots TTS Whisper Transcribe Output Parameters:

transcript

The output of this node is a string containing the transcribed text from the reference audio. This transcript serves as a crucial component for voice cloning tasks, providing the textual reference needed to replicate the original voice accurately. The quality of the transcript can significantly impact the effectiveness of the voice cloning process.

Dots TTS Whisper Transcribe Usage Tips:

Ensure that the reference audio is clear and free from background noise to improve transcription accuracy.
Select the appropriate Whisper model based on your needs for speed and accuracy; larger models may offer better accuracy but require more computational resources.
If you are working with non-English audio, specify the language to enhance transcription precision.
Adjust the chunk_length_s parameter to optimize performance for longer audio files, balancing between memory usage and processing time.

Dots TTS Whisper Transcribe Common Errors and Solutions:

Model not found

Explanation: This error occurs when the specified Whisper model is not available locally.
Solution: Ensure that download_if_missing is set to True to automatically download the required model.

Unsupported language

Explanation: The language specified is not supported by the Whisper model.
Solution: Verify that the language is included in the WHISPER_LANGUAGE_OPTIONS and adjust the parameter accordingly.

Audio format error

Explanation: The provided audio file is in an unsupported format or is corrupted.
Solution: Ensure the audio file is in a compatible format and is not corrupted before attempting transcription.

Dots TTS Whisper Transcribe Related Nodes

Go back to the extension to check out more related nodes.

Dots-TTS-ComfyUI

Table of Content

Description
DotsTTSWhisperTranscribe:
DotsTTSWhisperTranscribe Input Parameters:
DotsTTSWhisperTranscribe Output Parameters:
DotsTTSWhisperTranscribe Usage Tips:
DotsTTSWhisperTranscribe Common Errors and Solutions:
Related Nodes

Flux Fill | Inpaint and Outpaint

Official Flux Tools - Flux Fill for Inpainting and Outpainting

Controllable Animation in AI Video | Motion Control Tool

Make videos obey your motion rules instantly and precisely.

Consistent Character Creator 3.0 | Easy Consistency, Any Angle

Make characters stay the same, every angle, strong and perfect.

Video Character Replacement (MoCha) | Realistic Swap Tool

Swap video characters fast with realistic motion and lighting control.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: Dots TTS Whisper Transcribe

DotsTTSWhisperTranscribe

How to Install Dots-TTS-ComfyUI

Dots TTS Whisper Transcribe Description

Dots TTS Whisper Transcribe:

Dots TTS Whisper Transcribe Input Parameters:

audio

model

dtype

language

task

chunk_length_s

download_if_missing

Dots TTS Whisper Transcribe Output Parameters:

transcript

Dots TTS Whisper Transcribe Usage Tips:

Dots TTS Whisper Transcribe Common Errors and Solutions:

Model not found

Unsupported language

Audio format error

Dots TTS Whisper Transcribe Related Nodes