Save 4 hours! We auto-setup your workflow! Free!

Drop your workflow.json — we handle every dependency, custom node, and model. Just open the link and run.

Auto-Setup Workflow Json (Free) Now!
ComfyUI > Nodes > Dots-TTS-ComfyUI > Dots TTS Voice Clone

ComfyUI Node: Dots TTS Voice Clone

Class Name

DotsTTSVoiceClone

Category
Dots TTS
Author
Saganaki22 (Account age: 1867days)
Extension
Dots-TTS-ComfyUI
Latest Updated
2026-06-23
Github Stars
0.03K

How to Install Dots-TTS-ComfyUI

Install this extension via the ComfyUI Manager by searching for Dots-TTS-ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter Dots-TTS-ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Dots TTS Voice Clone Description

Generate realistic synthetic voices by cloning a reference speaker's voice using Dots TTS model.

Dots TTS Voice Clone:

The DotsTTSVoiceClone node is designed to facilitate the generation of speech through voice cloning using the Dots TTS model. This node allows you to create a synthetic voice that closely mimics a reference speaker's voice by utilizing a short audio clip as a reference. The primary goal of this node is to enable the creation of personalized and realistic voice outputs by leveraging advanced text-to-speech technology. By providing a reference audio and text, the node can generate speech that not only sounds like the reference speaker but also maintains the natural flow and intonation of human speech. This capability is particularly beneficial for applications requiring voice personalization, such as virtual assistants, audiobooks, and other multimedia content where a consistent and recognizable voice is desired.

Dots TTS Voice Clone Input Parameters:

dotstts_model

This parameter specifies the Dots TTS model to be used for voice cloning. It is crucial as it determines the underlying architecture and capabilities of the voice synthesis process. The model should be pre-loaded and compatible with the Dots TTS framework.

reference_audio

The reference_audio parameter is a critical input that provides a sample of the speaker's voice you wish to clone. It should be a clean audio clip, typically between 3 to 15 seconds long, to ensure accurate voice cloning. This audio serves as the basis for capturing the unique vocal characteristics of the speaker.

text

This parameter represents the text that you want to be converted into speech. The text input is essential as it defines the content of the generated speech. The node uses this text to produce a voice output that mimics the reference speaker's voice.

reference_text

The reference_text parameter is used in conjunction with the reference audio to enhance the accuracy of the voice cloning process. It provides the textual content of the reference audio, allowing the model to better align the audio characteristics with the intended speech content.

steps

This parameter controls the number of steps the model takes during the voice cloning process. A higher number of steps can lead to more refined and accurate voice synthesis, but it may also increase the processing time.

CFG

The CFG (Classifier-Free Guidance) parameter influences the balance between adhering to the reference audio's characteristics and the model's generalization capabilities. Adjusting this parameter can help fine-tune the voice output to be more or less similar to the reference speaker.

seed

The seed parameter is used to initialize the random number generator, ensuring reproducibility of the voice cloning results. By setting a specific seed value, you can achieve consistent outputs across different runs with the same input parameters.

language

This parameter specifies the language of the text input. It is important for ensuring that the generated speech adheres to the phonetic and linguistic rules of the specified language, thereby enhancing the naturalness of the voice output.

normalize_text

The normalize_text parameter determines whether the input text should be normalized before processing. Normalization can involve converting numbers to words, expanding abbreviations, and other text preprocessing steps to improve the clarity and accuracy of the generated speech.

max_audio_patches

This parameter sets the maximum audio budget for the voice cloning process. Each audio patch corresponds to approximately 0.32 seconds of audio. By adjusting this parameter, you can control the maximum duration of the generated speech, which is particularly useful for longer texts.

Dots TTS Voice Clone Output Parameters:

audio

The audio output parameter provides the generated speech audio as a result of the voice cloning process. This audio output is the synthesized voice that mimics the reference speaker, delivering the input text in a natural and personalized manner. The quality and accuracy of this output depend on the input parameters and the capabilities of the Dots TTS model used.

Dots TTS Voice Clone Usage Tips:

  • Ensure that the reference audio is clean and free from background noise to achieve the best voice cloning results.
  • Experiment with different CFG values to find the optimal balance between voice similarity and naturalness for your specific application.
  • Use a consistent seed value if you need to reproduce the same voice output across multiple runs.
  • Adjust the max_audio_patches parameter if you are working with longer texts to prevent the model from prematurely stopping the audio generation.

Dots TTS Voice Clone Common Errors and Solutions:

"decoder window size must be larger than chunk_size"

  • Explanation: This error occurs when the size of the decoder window is smaller than the chunk size of the audio being processed.
  • Solution: Ensure that the decoder window size is appropriately configured to be larger than the chunk size. Adjust the model settings or input parameters to resolve this issue.

"Invalid reference audio length"

  • Explanation: This error indicates that the reference audio provided is either too short or too long for effective voice cloning.
  • Solution: Provide a reference audio clip that is between 3 to 15 seconds long to ensure optimal voice cloning performance.

"Model not loaded"

  • Explanation: This error suggests that the Dots TTS model has not been properly loaded or initialized before attempting voice cloning.
  • Solution: Verify that the model is correctly loaded and compatible with the Dots TTS framework before executing the voice cloning process.

Dots TTS Voice Clone Related Nodes

Go back to the extension to check out more related nodes.
Dots-TTS-ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Dots TTS Voice Clone