ComfyUI > Nodes > ComfyUI_SparkTTS > Spark TTS Clone

ComfyUI Node: Spark TTS Clone

Class Name

SparkTTSClone

Category
🎤MW/MW-Spark-TTS
Author
mw (Account age: 2601days)
Extension
ComfyUI_SparkTTS
Latest Updated
2025-05-23
Github Stars
0.05K

How to Install ComfyUI_SparkTTS

Install this extension via the ComfyUI Manager by searching for ComfyUI_SparkTTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_SparkTTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Spark TTS Clone Description

Facilitates customizable text-to-speech cloning for AI artists.

Spark TTS Clone:

The SparkTTSClone node is designed to facilitate the cloning of text-to-speech (TTS) capabilities, allowing you to generate audio outputs from text inputs with a high degree of customization. This node is part of the SparkTTS suite, which is focused on providing advanced TTS functionalities. The primary goal of SparkTTSClone is to enable the creation of synthetic voices that can mimic specific characteristics such as gender, pitch, and speed, thereby offering a versatile tool for AI artists who wish to incorporate realistic voice synthesis into their projects. By leveraging this node, you can produce high-quality audio outputs that are tailored to your specific needs, enhancing the auditory experience of your creative works.

Spark TTS Clone Input Parameters:

text

The text parameter is the primary input for the SparkTTSClone node, representing the textual content that you wish to convert into speech. This parameter directly influences the audio output, as it determines the words and phrases that will be synthesized into speech. There are no explicit minimum or maximum values for this parameter, but the length and complexity of the text can impact the processing time and the quality of the generated audio.

gender

The gender parameter allows you to specify the gender characteristics of the synthesized voice. This parameter can be used to tailor the voice output to match a desired gender profile, enhancing the realism and appropriateness of the audio for specific contexts. While the context does not specify exact options, typical values might include "male," "female," or "neutral."

top_k

The top_k parameter is a numerical setting that influences the diversity of the generated speech by limiting the number of highest probability vocabulary tokens considered during generation. A higher value can increase diversity, while a lower value can make the output more deterministic. The context does not specify exact values, but typical ranges might be from 1 to 100.

top_p

The top_p parameter, also known as nucleus sampling, controls the cumulative probability threshold for token selection, allowing for more diverse outputs by considering a dynamic number of tokens. The default value is 0.95, with a range from 0 to 1, where 1 would consider all tokens, and lower values would restrict the selection to more probable tokens.

temperature

The temperature parameter affects the randomness of the speech generation process. A higher temperature results in more random outputs, while a lower temperature makes the output more focused and deterministic. The context does not specify exact values, but typical settings range from 0.1 to 1.0.

max_new_tokens

The max_new_tokens parameter sets the maximum number of tokens that can be generated in the output speech. This parameter helps control the length of the generated audio, with a default value of 3000 and a minimum of 500 tokens.

do_sample

The do_sample parameter is a boolean setting that determines whether sampling is used during the generation process. When set to True, the node will use sampling, which can introduce variability and creativity into the output. The default value is True.

unload_model

The unload_model parameter is a boolean setting that specifies whether the TTS model should be unloaded from memory after processing. This can help manage system resources, especially in environments with limited memory. The default value is True.

seed

The seed parameter is an integer that sets the random seed for the generation process, ensuring reproducibility of results. The default value is 0, with a range from 0 to a large maximum value, allowing for a wide variety of deterministic outputs.

Spark TTS Clone Output Parameters:

waveform

The waveform output parameter represents the audio waveform generated from the input text. This parameter is crucial as it contains the actual audio data that can be played back or further processed. The waveform is typically represented as a tensor, which can be used in various audio applications.

sample_rate

The sample_rate output parameter indicates the sample rate of the generated audio waveform, which is set at 16000 Hz. This parameter is important for ensuring compatibility with audio playback systems and for maintaining the quality of the audio output.

Spark TTS Clone Usage Tips:

  • Experiment with the temperature and top_p parameters to find the right balance between creativity and determinism in your audio outputs.
  • Use the gender parameter to match the voice characteristics to your project's needs, enhancing the realism and appropriateness of the synthesized speech.
  • Consider setting the unload_model parameter to True if you are working in a resource-constrained environment to free up memory after processing.

Spark TTS Clone Common Errors and Solutions:

Model not loaded

  • Explanation: This error occurs when the TTS model is not properly loaded into memory before processing.
  • Solution: Ensure that the model is correctly initialized and loaded by checking the model loading logic and verifying that all necessary resources are available.

Out of memory

  • Explanation: This error indicates that the system has run out of memory while processing the TTS request.
  • Solution: Try reducing the complexity of the input text or adjusting parameters like max_new_tokens to lower values. Additionally, ensure that the unload_model parameter is set to True to free up memory after processing.

Invalid input text

  • Explanation: This error occurs when the input text is not in a valid format or contains unsupported characters.
  • Solution: Verify that the input text is correctly formatted and free of any unsupported characters or symbols.

Spark TTS Clone Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_SparkTTS
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.