FL VoxCPM V2 TTS:
FL VoxCPM V2 TTS is a sophisticated text-to-speech node designed to generate high-quality speech using the VoxCPM V2 model. This node is equipped with advanced features such as Voice Design, Voice Cloning, Controllable Cloning, and Ultimate Cloning modes, allowing you to create highly expressive and personalized speech outputs. The node is particularly beneficial for AI artists and developers who wish to incorporate realistic and customizable voice synthesis into their projects. By leveraging the capabilities of VoxCPM V2, this node provides a versatile platform for generating speech that can be tailored to specific needs, whether it's for creating unique character voices or replicating existing ones with precision.
FL VoxCPM V2 TTS Input Parameters:
model_name
This parameter allows you to select the specific VoxCPM model to use for speech generation. It is crucial for determining the characteristics and capabilities of the generated speech. The available options are defined by the models supported by the node, and selecting the appropriate model can significantly impact the quality and style of the output.
text
The text parameter is where you input the script or content you wish to convert into speech. It supports multiline input, meaning each line is processed as a separate chunk, allowing for complex and varied speech synthesis. The default text is "VoxCPM is an innovative TTS model designed to generate highly expressive speech."
prompt_audio
This optional parameter allows you to provide reference audio for voice cloning. By supplying a sample of the desired voice, the node can more accurately replicate the voice characteristics in the generated speech.
prompt_text
The transcript of the reference audio is required for voice cloning. This optional parameter helps the node understand the context and content of the reference audio, ensuring a more accurate voice cloning process.
cfg_value
The guidance scale parameter, with a default value of 2.0, influences how closely the generated speech adheres to the provided prompt. Higher values result in speech that is more faithful to the prompt but may sound less natural. The range is from 1.0 to 10.0.
inference_timesteps
This parameter determines the number of diffusion steps used during speech generation. Higher values can improve the quality of the output but will increase processing time. The default is 10, with a range from 1 to 100.
min_tokens
Specifies the minimum length of generated audio tokens, ensuring that the output meets a certain duration. The default is 2, with a range from 1 to 100.
max_tokens
Defines the maximum length of generated audio tokens, controlling the upper limit of the speech duration. The default is 2048, with a range from 64 to 8192.
FL VoxCPM V2 TTS Output Parameters:
waveform
The waveform output parameter provides the generated audio in a tensor format, representing the synthesized speech. This output is crucial for further processing or playback, as it contains the actual audio data created by the node.
sample_rate
This parameter indicates the sample rate of the generated audio, which is essential for ensuring compatibility with various audio playback systems and maintaining the quality of the output.
FL VoxCPM V2 TTS Usage Tips:
- Experiment with different
model_nameoptions to find the best fit for your project's voice characteristics. - Use
prompt_audioandprompt_textfor accurate voice cloning, especially when replicating specific voices. - Adjust
cfg_valueto balance between naturalness and adherence to the prompt, depending on your needs. - Increase
inference_timestepsfor higher quality output, but be mindful of the increased processing time.
FL VoxCPM V2 TTS Common Errors and Solutions:
Model 'model_name' not found.
- Explanation: This error occurs when the specified model name is not available in the node's supported models.
- Solution: Ensure that you select a model name from the available options provided by the node.
'model_name' is a V1 model. Use the FL VoxCPM TTS node instead.
- Explanation: This error indicates that a V1 model was selected, which is not compatible with the V2 node.
- Solution: Switch to using the FL VoxCPM TTS node for V1 models or select a V2 model for this node.
