ComfyUI > Nodes > ComfyUI-KaniTTS > Kani TTS

ComfyUI Node: Kani TTS

Class Name

KaniTTS

Category
audio/tts
Author
wildminder (Account age: 4772days)
Extension
ComfyUI-KaniTTS
Latest Updated
2025-10-17
Github Stars
0.03K

How to Install ComfyUI-KaniTTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-KaniTTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-KaniTTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Kani TTS Description

Powerful node for realistic and expressive speech synthesis from text, customizable for various projects and devices.

Kani TTS:

KaniTTS is a powerful node designed to generate speech from text using the KaniTTS model. This node is particularly beneficial for AI artists and developers who wish to incorporate realistic and expressive speech synthesis into their projects. By leveraging advanced text-to-speech technology, KaniTTS can transform written text into natural-sounding audio, making it an essential tool for creating voiceovers, virtual assistants, and interactive media. The node supports various configurations, allowing users to customize the speech output to suit their specific needs, such as adjusting the randomness of the speech or selecting different speakers. Its ability to handle different devices and manage resources efficiently ensures smooth performance, even on systems with limited computational power.

Kani TTS Input Parameters:

model_name

This parameter allows you to select the specific KaniTTS model to use for speech generation. The available models may vary in their capabilities, such as support for different speakers. The default model is typically the 370m model, which supports speaker selection. Choosing the right model can impact the quality and characteristics of the generated speech.

speaker

The speaker parameter lets you choose a specific speaker's voice for the speech synthesis. This option is only applicable when using models that support multiple speakers, such as the 370m model. The default value is "None," which means no specific speaker is selected. Selecting a speaker can add a personalized touch to the generated audio.

text

This is the text input that you want to convert into speech. It supports multiline input, allowing you to synthesize longer passages of text. The default text is "Hello world! My name is Kani, I'm a speech generation model!" The text input must not be empty, as it is the primary content for speech generation.

temperature

The temperature parameter controls the randomness of the speech generation process. A higher temperature value results in more creative and varied speech, while a lower value produces more deterministic and consistent output. The default value is 1.4, with a range from 0.1 to 2.0, adjustable in steps of 0.05.

top_p

This parameter sets the nucleus sampling probability, which influences the diversity of the generated speech. A higher top_p value allows for more diverse outputs by considering a larger set of possible tokens. The default value is 0.95, with a range from 0.1 to 1.0, adjustable in steps of 0.05.

repetition_penalty

The repetition_penalty parameter applies a penalty to repeated tokens in the generated speech, helping to reduce redundancy and improve the naturalness of the output. The default value is 1.1, with a range from 1.0 to 2.0, adjustable in steps of 0.05.

max_new_tokens

This parameter defines the maximum number of audio tokens to generate, effectively setting the length of the synthesized speech. The default value is 1200, with a range from 100 to 2000, adjustable in steps of 50. Adjusting this parameter can help control the duration of the output audio.

seed

The seed parameter is used for reproducibility, ensuring that the same input parameters produce the same output. A value of -1 indicates a random seed, while other values can be used to generate consistent results. The default value is -1, with a range from -1 to 0xFFFFFFFFFFFFFFFF.

force_offload

This boolean parameter determines whether the KaniTTS model should be forcefully offloaded from VRAM after generation. The default setting is "Auto-Manage," which allows the system to manage resources automatically. Enabling "Force Offload" can be useful for freeing up VRAM in resource-constrained environments.

device

The device parameter specifies the hardware device to use for running the inference, such as "cuda" for GPU or "cpu" for CPU. The default device is determined by the system's available hardware. Selecting the appropriate device can significantly impact the performance and speed of the speech generation process.

dtype

This parameter sets the data type for the model's computations, affecting precision and performance. Supported types include "float16" and "float32," with "float16" being the default for MPS devices. Choosing the right dtype can optimize the balance between speed and accuracy.

Kani TTS Output Parameters:

waveform

The waveform output parameter contains the generated audio data in the form of a tensor. This tensor represents the synthesized speech waveform, which can be used for playback or further processing. The waveform is crucial for converting the text input into audible speech.

sample_rate

The sample_rate parameter indicates the audio sample rate of the generated waveform. This value is essential for ensuring that the audio is played back at the correct speed and quality. The sample rate is determined by the KaniTTS model's configuration and is typically set to a standard value for speech synthesis.

Kani TTS Usage Tips:

  • Experiment with different temperature and top_p values to achieve the desired balance between creativity and consistency in the generated speech.
  • Use the speaker parameter to add variety and personalization to your projects by selecting different voices.
  • Adjust the max_new_tokens parameter to control the length of the output audio, especially when working with longer text inputs.
  • Consider using the force_offload option in environments with limited VRAM to manage resources more effectively.

Kani TTS Common Errors and Solutions:

Text input cannot be empty.

  • Explanation: This error occurs when the text input is left blank, as the node requires text to generate speech.
  • Solution: Ensure that you provide a valid text input for the node to process.

Failed to load Kani TTS model '<model_name>'.

  • Explanation: This error indicates that the specified KaniTTS model could not be loaded, possibly due to incorrect model name or configuration issues.
  • Solution: Verify that the model name is correct and check the logs for more details on the loading process.

Unsupported dtype '<dtype>' for MPS, falling back to float16.

  • Explanation: This warning occurs when an unsupported data type is selected for MPS devices, prompting a fallback to float16.
  • Solution: Use either "float16" or "float32" as the dtype when working with MPS devices to avoid compatibility issues.

Kani TTS Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-KaniTTS
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.