FL CosyVoice3 Cross-Lingual:
The FL_CosyVoice3_CrossLingual node is designed to facilitate cross-lingual voice synthesis, allowing you to generate speech in a different language while maintaining the characteristics of a reference voice. This node is particularly beneficial for applications requiring multilingual support, such as virtual assistants or language learning tools, where the same voice needs to be used across different languages. By leveraging the capabilities of the CosyVoice model, this node can seamlessly synthesize text into speech in various languages, ensuring a consistent auditory experience. The node intelligently formats the input text based on the version of the CosyVoice model being used, either by incorporating system prompts or language tags, to optimize the synthesis process. This functionality is crucial for creating natural and coherent speech outputs that are linguistically accurate and contextually appropriate.
FL CosyVoice3 Cross-Lingual Input Parameters:
model
This parameter specifies the CosyVoice model to be used for synthesis. It is crucial as it determines the underlying technology and capabilities for generating the cross-lingual speech. The model is loaded from the ModelLoader, ensuring compatibility and optimal performance.
text
This parameter is the text you wish to synthesize into speech. It supports multiline input, allowing for complex and lengthy text to be processed. The default value is "Hello, this is cross-lingual speech synthesis." The text's content directly influences the speech output, making it essential for defining what is spoken.
reference_audio
This parameter provides the reference voice audio, which can be in any language. It is used to maintain the voice characteristics across different languages, ensuring that the synthesized speech retains the same vocal identity as the reference.
speed
This parameter controls the speed of the synthesized speech, acting as a multiplier. The default value is 1.0, with a range from 0.5 to 2.0, allowing you to slow down or speed up the speech as needed. Adjusting this parameter can help match the speech pace to specific requirements or preferences.
target_language
This optional parameter specifies the target language for synthesis. It offers options like "auto" for automatic detection or specific language codes such as "zh" for Chinese, "en" for English, etc. The default is "auto," which attempts to detect the language from the text. Specifying a language ensures accurate pronunciation and language-specific nuances.
seed
This optional parameter sets the random seed for the synthesis process, with a default value of 42. It ranges from -1 for a random seed to 2147483647. Setting a seed ensures reproducibility of the synthesis results, which is useful for consistent outputs across different runs.
text_frontend
This optional parameter enables or disables text normalization, with a default value of True. When enabled, it processes the text for standardization, which is beneficial for natural language inputs. Disabling it is useful when using CMU phonemes or special tags like <slow>.
FL CosyVoice3 Cross-Lingual Output Parameters:
audio
The output parameter is the synthesized audio in the ComfyUI AUDIO format. This audio output represents the text spoken in the target language, using the reference voice characteristics. It is crucial for applications that require high-quality, natural-sounding speech synthesis across different languages. The audio output includes details such as waveform and sample rate, ensuring it is ready for playback or further processing.
FL CosyVoice3 Cross-Lingual Usage Tips:
- Ensure that the reference audio is clear and of high quality to achieve the best results in maintaining voice characteristics across languages.
- Use the
target_languageparameter to specify the desired language explicitly when automatic detection might not be reliable, especially for short or ambiguous text. - Adjust the
speedparameter to match the desired speech pace, which can be particularly useful for language learning applications where slower speech might aid comprehension.
FL CosyVoice3 Cross-Lingual Common Errors and Solutions:
Error in cross-lingual synthesis: <error_message>
- Explanation: This error indicates that an issue occurred during the synthesis process, which could be due to incorrect input parameters or model loading issues.
- Solution: Check that all input parameters are correctly specified and that the CosyVoice model is properly loaded. Ensure that the reference audio is accessible and in the correct format. If the problem persists, review the traceback for more detailed error information and adjust the inputs accordingly.
