Visit ComfyUI Online for ready-to-use ComfyUI environment
Versatile voice conversion tool supporting multiple engines for transforming audio to mimic target voices with ease.
The UnifiedVoiceChangerNode is a versatile and engine-agnostic tool designed for voice conversion within the TTS Audio Suite. It has been refactored from the ChatterBox VC to support multiple voice conversion engines, including the current ChatterBox and future RVC engines. This node allows you to transform a source audio file to mimic the characteristics of a target voice, making it an invaluable asset for applications requiring voice customization and transformation. By supporting multiple engines, it provides flexibility and future-proofing, ensuring compatibility with evolving technologies. The node's primary goal is to facilitate seamless voice conversion with minimal technical complexity, making it accessible to users who may not have a deep technical background.
This parameter specifies the Text-to-Speech (TTS) or Voice Conversion (VC) engine configuration to be used for the conversion process. It supports the ChatterBox TTS Engine and is prepared for future integration with the RVC Engine. The choice of engine can significantly impact the quality and characteristics of the converted voice, as different engines may have unique processing capabilities and language support.
The source_audio parameter is the original voice audio that you wish to convert. It accepts audio input directly or output from a Character Voices node. This audio serves as the base for the conversion process, and its quality and clarity can affect the final output. The node processes this audio to extract the necessary features for conversion.
This parameter represents the reference voice audio whose characteristics will be applied to the source audio. Like source_audio, it accepts audio input or output from a Character Voices node. The narrator_target is crucial as it defines the desired voice characteristics, such as tone, pitch, and style, that will be applied to the source audio.
The refinement_passes parameter determines the number of conversion iterations to be performed. Each pass refines the output to sound more like the target voice. The default value is 1, with a minimum of 1 and a maximum of 30. It is recommended to use a maximum of 5 passes, as more can cause distortions. Each iteration is deterministic, which helps reduce degradation in the output quality.
The converted_audio is the primary output of the UnifiedVoiceChangerNode. It is the transformed version of the source_audio, modified to mimic the characteristics of the narrator_target. This output is crucial for applications that require voice transformation, as it provides the final audio result after processing through the selected engine and refinement passes.
source_audio and narrator_target are of high quality and free from background noise, as this can affect the conversion process.refinement_passes to find the optimal balance between quality and processing time. While more passes can improve the likeness to the target voice, they may also introduce distortions if overused.narrator_target is not recognized as an RVC Character Model, which is required for RVC conversions.narrator_target is correctly set as an RVC Character Model. If not, adjust the input to match the expected format for RVC conversions.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.