FL CosyVoice3 Speaker Instruct2:
FL_CosyVoice3_SpeakerInstruct2 is a sophisticated node designed to synthesize speech by leveraging a saved speaker preset for voice timbre, combined with instruct text to control the style, emotion, and tone of the speech. This node is particularly beneficial for creating personalized and expressive voice outputs without the need for live reference audio. It utilizes the inference_instruct2 method, which allows for zero-shot speaker identification, making it a powerful tool for AI artists looking to generate unique and dynamic audio content. The node's primary goal is to provide a seamless and efficient way to produce high-quality speech synthesis that can be tailored to specific artistic needs, enhancing the creative process with its advanced capabilities.
FL CosyVoice3 Speaker Instruct2 Input Parameters:
model
This parameter requires a CosyVoice model from the Model Loader. It is essential for the node's operation as it provides the underlying framework for speech synthesis. The model must support the inference_instruct2 method, which is crucial for the node's functionality.
text
This is the text you wish to synthesize into speech. It can be a multiline string, allowing for complex and lengthy inputs. The default value is "Hello, this is my cloned voice speaking." This parameter directly influences the content of the synthesized speech.
instruct_text
This parameter provides instructions to control the speaking style, emotion, and tone. It supports multiline input and can include examples like "请非常开心地说这句话。" or "Please say this in a very soft voice." The default value is "请非常开心地说这句话。" This parameter is vital for customizing the expressiveness of the synthesized speech.
speaker_preset
This parameter specifies the speaker preset saved by the FL CosyVoice3 Save Speaker node. It is crucial for defining the voice timbre and must be set to a valid preset. If set to "[none]", the node will raise an error.
speed
This parameter controls the speech speed multiplier, allowing you to adjust the tempo of the synthesized speech. It accepts values between 0.5 and 2.0, with a default of 1.0 and a step of 0.05. Adjusting this parameter can significantly impact the pacing and delivery of the speech.
seed
This optional parameter sets the random seed for reproducibility. It accepts integer values with a default of 42, and a range from -1 (for random) to 2147483647. Setting a specific seed ensures consistent results across different runs.
text_frontend
This optional boolean parameter enables text normalization. When set to True (default), it normalizes the text input. Disable it for CMU phonemes or special tags.
FL CosyVoice3 Speaker Instruct2 Output Parameters:
audio
The output is an audio object containing the synthesized speech. It includes the waveform and sample rate, providing a ready-to-use audio file that reflects the input text and instructions. This output is crucial for AI artists as it represents the final product of the synthesis process, ready for integration into creative projects.
FL CosyVoice3 Speaker Instruct2 Usage Tips:
- Ensure that the
speaker_presetis correctly set by using the FL CosyVoice3 Save Speaker node to create and save presets before synthesis. - Use the
instruct_textparameter creatively to explore different emotional and stylistic expressions in your synthesized speech. - Adjust the
speedparameter to match the desired pacing of your project, keeping in mind that extreme values may affect intelligibility.
FL CosyVoice3 Speaker Instruct2 Common Errors and Solutions:
"inference_instruct2 is not available on this model."
- Explanation: The selected model does not support the
inference_instruct2method required by this node. - Solution: Ensure you are using a CosyVoice2 or CosyVoice3 model that includes the
inference_instruct2method.
"Speaker preset file not found: <path>"
- Explanation: The specified speaker preset file does not exist at the given path.
- Solution: Use the FL CosyVoice3 Save Speaker node to create and save the necessary speaker preset file.
"No speaker presets found."
- Explanation: The
speaker_presetparameter is set to "[none]", indicating no preset is available. - Solution: Create a speaker preset using the FL CosyVoice3 Save Speaker node and specify it in the
speaker_presetparameter.
"instruct_text cannot be empty."
- Explanation: The
instruct_textparameter is empty or contains only whitespace. - Solution: Provide valid style instructions in the
instruct_textparameter to guide the synthesis process.
