ComfyUI > Nodes > ComfyUI > ElevenLabs Speech to Speech

ComfyUI Node: ElevenLabs Speech to Speech

Class Name

ElevenLabsSpeechToSpeech

Category
api node/audio/ElevenLabs
Author
ComfyAnonymous (Account age: 763days)
Extension
ComfyUI
Latest Updated
2026-05-13
Github Stars
112.77K

How to Install ComfyUI

Install this extension via the ComfyUI Manager by searching for ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ElevenLabs Speech to Speech Description

Transform speech with emotional nuances while preserving authenticity for diverse voice options in audio projects.

ElevenLabs Speech to Speech:

The ElevenLabsSpeechToSpeech node is designed to transform speech from one voice to another while preserving the original content and emotion. This node is particularly beneficial for applications that require voice conversion, such as dubbing, voiceovers, or creating personalized voice experiences. By leveraging advanced speech synthesis technology, it ensures that the emotional nuances and intent of the original speech are maintained, providing a seamless and natural-sounding transformation. This capability is crucial for maintaining the authenticity and emotional impact of the spoken content, making it a valuable tool for creators and developers looking to enhance their audio projects with diverse voice options.

ElevenLabs Speech to Speech Input Parameters:

voice

The voice parameter specifies the target voice for the transformation. It is essential for determining the final output voice characteristics. You can connect this from the Voice Selector or Instant Voice Clone, allowing you to choose from a range of predefined or custom voices. This flexibility enables you to tailor the voice transformation to suit specific project needs or personal preferences.

audio

The audio parameter is the source audio that you wish to transform. This input is crucial as it provides the original speech content that will undergo voice conversion. The quality and clarity of the source audio can significantly impact the effectiveness of the transformation, so it is advisable to use high-quality recordings for optimal results.

stability

The stability parameter controls the voice stability during transformation, with a default value of 0.5. It ranges from 0.0 to 1.0, where lower values allow for a broader emotional range, and higher values produce more consistent but potentially monotonous speech. Adjusting this parameter helps in fine-tuning the emotional expressiveness of the transformed voice, making it either more dynamic or stable based on the desired outcome.

model

The model parameter allows you to select the model used for speech-to-speech transformation. Options include eleven_multilingual_sts_v2 and eleven_english_sts_v2. This choice determines the underlying technology and capabilities of the transformation process, such as language support and voice synthesis quality. Selecting the appropriate model is crucial for achieving the best results, especially when dealing with multilingual content or specific voice characteristics.

ElevenLabs Speech to Speech Output Parameters:

audio_output

The audio_output parameter provides the transformed audio as the output. This is the final product of the speech-to-speech transformation process, featuring the original content expressed in the selected target voice. The quality and fidelity of this output are essential for ensuring that the transformed speech meets the desired standards for clarity, emotional expression, and authenticity.

ElevenLabs Speech to Speech Usage Tips:

  • Experiment with different stability settings to achieve the desired emotional expressiveness in the transformed voice. Lower stability values can add more emotional depth, while higher values ensure consistency.
  • Choose the appropriate model based on the language and voice characteristics required for your project. This can significantly impact the quality and naturalness of the voice transformation.

ElevenLabs Speech to Speech Common Errors and Solutions:

Unknown voice: <voice_name>

  • Explanation: This error occurs when the specified voice is not recognized by the system, possibly due to a typo or an unsupported voice selection.
  • Solution: Verify that the voice name is correct and matches one of the available options in the Voice Selector or Instant Voice Clone. Ensure that the voice is supported by the selected model.

Invalid audio input

  • Explanation: This error indicates that the provided audio input is not in a compatible format or is corrupted.
  • Solution: Ensure that the audio file is in a supported format and is not corrupted. Re-upload the audio file if necessary and check for any issues with the file integrity.

ElevenLabs Speech to Speech Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.