RunningHub VoxCPM Generate Speech:
The RunningHub_VoxCPM_Generate node is designed to synthesize speech from text using the VoxCPM model. This node is part of the RunningHub suite, which focuses on advanced speech generation capabilities. It allows you to input text and various control parameters to generate audio output, making it a powerful tool for creating dynamic and expressive speech. The node's primary function is to convert written text into spoken words, providing a seamless way to produce audio content. By leveraging the capabilities of the VoxCPM model, this node can produce high-quality speech that can be customized through various input parameters, offering flexibility and control over the generated audio.
RunningHub VoxCPM Generate Speech Input Parameters:
model
The model parameter specifies the VoxCPM model to be used for speech generation. It is a required input and must be of the type VOXCPM_MODEL. This parameter is crucial as it determines the underlying model architecture and capabilities that will be used to synthesize speech.
control_instruction
The control_instruction parameter allows you to provide specific instructions or guidelines that influence how the text is converted into speech. It is a string input that can be multiline, with a default value of an empty string. This parameter can be used to adjust the tone, style, or other aspects of the speech output.
text
The text parameter is the core input for the speech generation process. It is a string input that can be multiline, with a default value of "Hello, this is a test." This parameter contains the actual text that will be converted into speech, making it the primary content for the audio output.
cfg_value
The cfg_value parameter is a float that controls the configuration value for the speech generation process. It has a default value of 2.0, with a minimum of 0.1 and a maximum of 5.0, adjustable in steps of 0.1. This parameter can affect the quality and characteristics of the generated speech, allowing for fine-tuning of the output.
inference_steps
The inference_steps parameter is an integer that specifies the number of inference steps to be used during speech generation. It has a default value of 10, with a minimum of 1 and a maximum of 50, adjustable in steps of 1. This parameter influences the processing time and detail of the generated speech, with more steps potentially leading to higher quality output.
seed
The seed parameter is an integer used to initialize the random number generator for the speech generation process. It has a default value of 0 and can range from 0 to 0xffffffffffffffff. This parameter ensures reproducibility of the generated speech, allowing you to produce the same audio output given the same input conditions.
reference_audio
The reference_audio parameter is an optional input that allows you to provide an audio file as a reference for the speech generation process. This can be used to match the style or characteristics of the reference audio in the generated speech.
ultimate_clone
The ultimate_clone parameter is a boolean that, when enabled, attempts to closely mimic the reference audio provided. This can be useful for creating highly accurate voice clones based on the reference input.
reference_audio_text
The reference_audio_text parameter is an optional string input that provides the text content of the reference audio. This can help in aligning the generated speech with the reference audio's content.
normalize_text
The normalize_text parameter is a boolean that, when enabled, normalizes the input text before processing. This can help in ensuring consistency and clarity in the generated speech.
denoise_reference
The denoise_reference parameter is a boolean that, when enabled, applies denoising to the reference audio. This can improve the quality of the reference audio used in the speech generation process.
max_len
The max_len parameter is an integer that specifies the maximum length of the generated speech in terms of characters. It has a default value of 4096, allowing for control over the duration of the audio output.
retry_badcase
The retry_badcase parameter is a boolean that, when enabled, retries the speech generation process in case of poor quality output. This can help in ensuring the best possible audio quality.
RunningHub VoxCPM Generate Speech Output Parameters:
audio
The audio output parameter is the generated speech audio resulting from the input text and parameters. This output is of type AUDIO and represents the final synthesized speech, which can be used for various applications such as voiceovers, virtual assistants, or any scenario requiring text-to-speech conversion.
RunningHub VoxCPM Generate Speech Usage Tips:
- To achieve the best quality speech output, experiment with the
cfg_valueandinference_stepsparameters to find the optimal balance between processing time and audio quality. - Use the
control_instructionparameter to fine-tune the style and tone of the generated speech, especially when specific vocal characteristics are desired. - If you have a specific voice style in mind, provide a
reference_audioto guide the speech generation process and enableultimate_clonefor more accurate voice replication.
RunningHub VoxCPM Generate Speech Common Errors and Solutions:
Model not loaded
- Explanation: This error occurs when the
modelparameter is not properly initialized or loaded. - Solution: Ensure that the VoxCPM model is correctly loaded using the
RunningHub_VoxCPM_LoadModelnode before attempting to generate speech.
Invalid text input
- Explanation: This error arises when the
textparameter is empty or not properly formatted. - Solution: Verify that the
textparameter contains valid and properly formatted text for speech generation.
Inference steps out of range
- Explanation: This error occurs when the
inference_stepsparameter is set outside the allowed range. - Solution: Adjust the
inference_stepsparameter to be within the range of 1 to 50.
Seed value out of range
- Explanation: This error happens when the
seedparameter is set outside the valid range. - Solution: Ensure that the
seedparameter is within the range of 0 to 0xffffffffffffffff.
