ComfyUI > Nodes > ComfyUI_KokoroTTS_MW > Kokoro ZH Run

ComfyUI Node: Kokoro ZH Run

Class Name

Kokoro ZH Run

Category
🎤MW/MW-KokoroTTS
Author
mw (Account age: 2267days)
Extension
ComfyUI_KokoroTTS_MW
Latest Updated
2025-04-27
Github Stars
0.02K

How to Install ComfyUI_KokoroTTS_MW

Install this extension via the ComfyUI Manager by searching for ComfyUI_KokoroTTS_MW
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_KokoroTTS_MW in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Kokoro ZH Run Description

Specialized Chinese text-to-speech node for high-quality audio synthesis in applications like voiceovers and virtual assistants.

Kokoro ZH Run:

Kokoro ZH Run is a specialized node designed to convert text into speech using a Chinese language model. It leverages advanced text-to-speech (TTS) technology to generate high-quality audio outputs from given text inputs. This node is particularly beneficial for applications requiring natural-sounding Chinese speech synthesis, such as voiceovers, virtual assistants, and interactive media. By utilizing a pre-trained model, Kokoro ZH Run ensures efficient and accurate speech generation, making it a valuable tool for AI artists and developers looking to incorporate realistic voice elements into their projects. The node's primary goal is to provide seamless and expressive speech synthesis, enhancing the auditory experience of any application it is integrated into.

Kokoro ZH Run Input Parameters:

text

The text parameter is the primary input for the Kokoro ZH Run node, representing the text that you wish to convert into speech. This parameter directly influences the content of the generated audio, as the node processes the input text to produce a corresponding spoken version. There are no explicit minimum or maximum values for this parameter, but the length and complexity of the text can affect processing time and the resulting audio's quality. It is advisable to provide clear and concise text to ensure optimal speech synthesis.

voice

The voice parameter allows you to select the specific voice model used for speech synthesis. This parameter impacts the tone, pitch, and overall character of the generated speech, enabling you to customize the audio output to suit your project's needs. The available options for this parameter are predefined voice models, such as zf_xiaobei.pt or zm_yunjian.pt, which are stored in the voices directory. Choosing the right voice model can significantly enhance the expressiveness and authenticity of the synthesized speech.

speed

The speed parameter controls the rate at which the text is spoken in the generated audio. This parameter can be adjusted to make the speech faster or slower, depending on your requirements. The default speed is set to 1, which represents a normal speaking rate. Adjusting the speed can help match the audio to specific timing constraints or stylistic preferences, ensuring that the speech aligns with the intended pacing of your application.

Kokoro ZH Run Output Parameters:

waveform

The waveform output parameter is a tensor representing the audio waveform of the synthesized speech. This parameter is crucial as it contains the actual audio data that can be played back or further processed. The waveform is generated based on the input text and selected voice model, and it reflects the nuances of the synthesized speech, including intonation and rhythm. Understanding the waveform output is essential for integrating the audio into multimedia projects or applications.

sample_rate

The sample_rate output parameter indicates the number of samples per second in the generated audio, with a default value of 24000 Hz. This parameter is important for ensuring compatibility with audio playback systems and maintaining the quality of the synthesized speech. A higher sample rate generally results in better audio fidelity, making it a key consideration when working with high-quality audio outputs.

Kokoro ZH Run Usage Tips:

  • Ensure that the input text is clear and well-structured to achieve the best speech synthesis results.
  • Experiment with different voice models to find the one that best fits the tone and style of your project.
  • Adjust the speed parameter to match the desired pacing of your application, ensuring that the speech aligns with other multimedia elements.

Kokoro ZH Run Common Errors and Solutions:

Generation failed: <error_message>

  • Explanation: This error occurs when the node encounters an issue during the speech generation process, which could be due to an invalid input or a problem with the model.
  • Solution: Verify that the input text is correctly formatted and that the selected voice model is available. Ensure that your system has sufficient resources to run the model, and try reloading the node if the problem persists. If necessary, clear the model cache and restart the application to resolve any underlying issues.

Kokoro ZH Run Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_KokoroTTS_MW
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.