qwen3 / base:
CivitaiTextToSpeechVllmOmniQwen3Base is a powerful node designed to convert text into speech using the advanced capabilities of the vllm-omni engine within the qwen3 ecosystem. This node is part of the Civitai Orchestration suite, which focuses on providing high-quality audio outputs from textual inputs. It is particularly beneficial for AI artists and developers who need to generate realistic and expressive speech from written content. The node's primary function is to transform text into audio, making it an essential tool for applications that require voice synthesis, such as virtual assistants, audiobooks, and interactive media. By leveraging the sophisticated algorithms of the vllm-omni engine, this node ensures that the generated speech is not only clear and natural but also customizable to suit various needs and preferences.
qwen3 / base Input Parameters:
text
The text parameter is the core input for this node, representing the written content you wish to convert into speech. It directly influences the audio output, as the node will synthesize speech based on the text provided. There are no specific minimum or maximum values for this parameter, but the length of the text may affect processing time and the resulting audio's duration.
language
The language parameter specifies the language in which the text is written. This is crucial for ensuring that the speech synthesis engine correctly interprets and pronounces the text. The choice of language can significantly impact the accuracy and naturalness of the generated speech. While specific language options are not detailed, it is important to select the appropriate language for your text to achieve the best results.
max_new_tokens
The max_new_tokens parameter determines the maximum number of tokens (or words) that the node will process from the input text. This parameter helps manage the length of the generated speech, ensuring that it remains within a manageable and desired range. Adjusting this value can help optimize performance, especially when dealing with longer texts.
ref_audio_url
The ref_audio_url parameter allows you to provide a reference audio URL, which the node can use to match the style or tone of the generated speech. This can be particularly useful if you want the synthesized voice to mimic a specific speaker or audio sample. The URL should point to an accessible audio file that the node can analyze.
ref_text
The ref_text parameter serves as a reference text that can guide the speech synthesis process. By providing a sample text, you can influence the style or emphasis of the generated speech, ensuring it aligns with your desired output. This parameter is optional but can enhance the customization of the speech synthesis.
x_vector_only_mode
The x_vector_only_mode parameter is a specialized setting that, when enabled, focuses the node on generating speech using only x-vectors. This mode can be useful for specific applications where you want to emphasize certain vocal characteristics or styles. The default setting is typically disabled, allowing for a broader range of synthesis options.
qwen3 / base Output Parameters:
audio_blob
The audio_blob output is the primary result of the node, containing the synthesized speech in audio format. This output is crucial for any application that requires audio playback, as it represents the final product of the text-to-speech conversion process.
model_type
The model_type output provides information about the type of model used for the speech synthesis. This can be useful for understanding the characteristics and capabilities of the generated speech, especially if you are comparing outputs from different models.
speaker
The speaker output indicates the voice or speaker profile used in the synthesis process. This information can be important if you are using multiple speaker profiles or need to ensure consistency across different audio outputs.
workflow_id
The workflow_id output is a unique identifier for the specific text-to-speech conversion process. This can be helpful for tracking and managing multiple synthesis tasks, especially in complex workflows or batch processing scenarios.
raw_json
The raw_json output provides a detailed JSON representation of the synthesis process, including metadata and configuration details. This output is valuable for debugging, analysis, and record-keeping, as it offers insights into the node's operation and settings.
qwen3 / base Usage Tips:
- Ensure that the
languageparameter matches the language of your input text to achieve the most accurate and natural speech synthesis. - Use the
ref_audio_urlandref_textparameters to customize the style and tone of the generated speech, especially if you have specific requirements for the voice output. - Adjust the
max_new_tokensparameter to control the length of the generated speech, which can help manage processing time and ensure the output meets your needs.
qwen3 / base Common Errors and Solutions:
Invalid audio URL
- Explanation: The
ref_audio_urlprovided is not accessible or does not point to a valid audio file. - Solution: Verify that the URL is correct and points to a publicly accessible audio file. Ensure that the file format is supported by the node.
Language not supported
- Explanation: The specified
languageis not supported by the speech synthesis engine. - Solution: Check the list of supported languages and select an appropriate one for your text. If necessary, adjust the text to match a supported language.
Exceeded token limit
- Explanation: The input text exceeds the maximum number of tokens allowed by the
max_new_tokensparameter. - Solution: Reduce the length of the input text or increase the
max_new_tokensvalue to accommodate longer texts.
