Visit ComfyUI Online for ready-to-use ComfyUI environment
Generate speech from text with Edge TTS in ComfyUI for AI artists, simplifying text-to-speech conversion with customizable settings.
EdgeTTS is a node designed to generate speech from text using Microsoft's Edge online text-to-speech service. This node is integrated into ComfyUI, providing a seamless way to convert written text into spoken words. It leverages the capabilities of Edge TTS to produce high-quality audio outputs, making it an invaluable tool for AI artists who wish to incorporate voice elements into their projects. The node simplifies the process of text-to-speech conversion by handling the complexities of audio generation, allowing you to focus on the creative aspects of your work. By using EdgeTTS, you can easily transform text into audio with customizable voice, speed, and pitch settings, ensuring that the output aligns with your artistic vision.
The text
parameter is the core input for the EdgeTTS node, representing the written content you wish to convert into speech. It is crucial to ensure that the text is not empty, as the node requires valid input to function correctly. The text is processed and transformed into audio, with the quality and clarity of the output depending on the content provided. There are no specific minimum or maximum values for this parameter, but it should be a meaningful string of text.
The voice
parameter allows you to select the specific voice that will be used to generate the speech. This parameter is essential for customizing the audio output to match your desired tone and style. While the context does not specify the available options, it is likely that a range of voices is supported by the Edge TTS service. If the selected voice fails to produce audio, the node will attempt to use a default voice as a fallback.
The speed
parameter controls the rate at which the text is spoken. It is expressed as a percentage, with a default value of 1.0
, representing normal speed. Adjusting this parameter allows you to speed up or slow down the speech, providing flexibility in how the audio is presented. The speed is calculated as a percentage change from the default rate, with positive values increasing the speed and negative values decreasing it.
The pitch
parameter adjusts the pitch of the generated speech, allowing you to modify the tone of the voice. It is expressed in Hertz (Hz) and can be set to positive or negative values to increase or decrease the pitch, respectively. This parameter is useful for fine-tuning the audio output to better fit the mood or character you are trying to convey.
The waveform
output parameter represents the audio data generated by the EdgeTTS node. It is a tensor containing the waveform of the spoken text, which can be used for further processing or playback. The waveform is normalized to ensure consistent audio levels, making it suitable for integration into various projects.
The sample_rate
output parameter indicates the sample rate of the generated audio. It is a crucial aspect of the audio data, as it defines the number of samples per second in the waveform. A higher sample rate generally results in better audio quality, providing a more accurate representation of the original speech.
text
parameter is not empty to avoid errors and ensure successful audio generation.voice
, speed
, and pitch
settings to achieve the desired audio output that best fits your project.waveform
and sample_rate
outputs to integrate the generated audio into your projects, ensuring compatibility with other audio processing tools.text
parameter is empty or contains only whitespace.text
parameter to ensure successful audio generation.<voice>
, trying default voice <default_voice>
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.