FL VoxCPM TTS:
FL_VoxCPM_TTS is a powerful node designed to generate speech or clone voices using the VoxCPM model version 1.5, with support for LoRA (Low-Rank Adaptation) to enhance style and fine-tuning capabilities. This node is particularly beneficial for AI artists and developers who wish to create highly expressive and natural-sounding speech from text inputs. By leveraging advanced text-to-speech (TTS) technology, FL_VoxCPM_TTS allows for the synthesis of speech that can mimic specific voices or create entirely new vocal styles, making it an essential tool for projects requiring voice customization and cloning. The node's ability to handle various input parameters ensures flexibility and control over the generated audio, providing users with the means to achieve their desired auditory outcomes.
FL VoxCPM TTS Input Parameters:
model_name
This parameter allows you to select the VoxCPM model to use for speech generation. The choice of model can affect the quality and characteristics of the generated speech. The default model is the first option in the list of available models.
lora_name
This parameter lets you choose a LoRA to apply for style or fine-tuning purposes. LoRA can modify the speech style, adding a layer of customization to the generated audio. The rank of the LoRA is automatically detected, and the default option is "None."
text
This is the main text input for synthesis. You can enter the text you want to convert into speech, with each line processed as a separate chunk. The default text is "VoxCPM is an innovative TTS model designed to generate highly expressive speech."
prompt_audio
An optional parameter where you can provide reference audio for voice cloning. This helps the model to mimic the voice characteristics of the provided audio.
prompt_text
This optional parameter requires the transcript of the reference audio when performing voice cloning. It ensures that the generated speech aligns with the intended voice characteristics.
cfg_value
The guidance scale parameter, which ranges from 1.0 to 10.0, with a default value of 2.0. Higher values make the output adhere more closely to the prompt, but may result in less natural-sounding speech.
inference_timesteps
This parameter determines the number of diffusion steps used during generation. It ranges from 1 to 100, with a default of 10. More steps can improve quality but increase processing time.
min_tokens
Specifies the minimum length of generated audio tokens, ranging from 1 to 100, with a default of 2. This ensures a baseline length for the audio output.
max_tokens
Defines the maximum length of generated audio tokens, with a range from 64 to 8192 and a default of 2048. This controls the upper limit of the audio duration.
FL VoxCPM TTS Output Parameters:
waveform
The waveform output is a tensor representing the generated audio signal. It is crucial for playback or further processing, as it contains the actual sound data produced by the node.
sample_rate
This parameter indicates the sample rate of the generated audio, which is essential for ensuring the audio is played back at the correct speed and quality. It matches the sample rate used by the VoxCPM model.
FL VoxCPM TTS Usage Tips:
- Experiment with different
cfg_valuesettings to balance between adherence to the prompt and naturalness of the speech. Lower values may sound more natural, while higher values stick closely to the input prompt. - Use
prompt_audioandprompt_textfor voice cloning to achieve more personalized and accurate voice synthesis, especially when trying to mimic a specific voice. - Adjust
inference_timestepsto find a sweet spot between quality and processing time. More steps can enhance quality but will take longer to process.
FL VoxCPM TTS Common Errors and Solutions:
Generation error: <error_message>
- Explanation: This error occurs when there is an issue during the audio generation process, possibly due to incorrect input parameters or model configuration.
- Solution: Check all input parameters for correctness, ensure the model and LoRA are properly selected, and verify that any optional inputs like
prompt_audioandprompt_textare correctly provided if used.
Force offloading VoxCPM model '<model_name>' from VRAM...
- Explanation: This message indicates that the model is being offloaded from VRAM to free up resources, which can happen if
force_offloadis enabled. - Solution: If you encounter performance issues, consider disabling
force_offloadunless necessary, or ensure your system has sufficient VRAM to handle the model without offloading.
