Visit ComfyUI Online for ready-to-use ComfyUI environment
ComfyUI IndexTTSNode: Converts text to speech with customization options for language, speed, and voice style.
The IndexTTSNode is a component of the ComfyUI system designed to facilitate text-to-speech (TTS) synthesis. This node leverages the capabilities of the IndexTTS model to convert written text into spoken audio, providing a seamless way to generate speech from text inputs. It is particularly useful for applications that require voice synthesis, such as virtual assistants, audiobooks, or any interactive system that benefits from auditory feedback. The node allows for customization of the speech output by adjusting parameters like language and speed, making it versatile for different linguistic and pacing needs. By using a reference audio, it can also mimic the style or tone of a specific voice, enhancing the personalization of the generated speech.
This parameter accepts a string input, which is the text you want to convert into speech. It supports multiline text, allowing for longer passages to be synthesized. The default text is "你好,我是IndexTTS语音合成系统。" which translates to "Hello, I am the IndexTTS speech synthesis system." This parameter is crucial as it forms the basis of the audio output.
This parameter takes an audio file as input, which serves as a reference for the voice style or tone that the synthesized speech should emulate. By providing a reference audio, you can achieve a more personalized and consistent voice output that matches the desired characteristics.
This parameter specifies the language of the text input. It offers options such as "auto," "zh" (Chinese), "en" (English), "ja" (Japanese), and "ko" (Korean), with "auto" being the default setting. Selecting the correct language ensures that the text is pronounced accurately according to the linguistic rules of the chosen language.
This parameter controls the speed of the synthesized speech. It is a float value with a default of 1.0, representing normal speed. The speed can be adjusted between a minimum of 0.5 and a maximum of 2.0, with increments of 0.1. Modifying the speed allows you to tailor the pacing of the speech to suit different contexts or preferences.
This output parameter provides the generated audio file, which is the result of the text-to-speech conversion process. The synthesized audio reflects the input text, reference audio style, language, and speed settings, delivering a customized speech output that can be used in various applications.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.