Save 4 hours! We auto-setup your workflow! Free!

Drop your workflow.json — we handle every dependency, custom node, and model. Just open the link and run.

Auto-Setup Workflow Json (Free) Now!
ComfyUI > Nodes > civitai-comfy-nodes > qwen3 / base

ComfyUI Node: qwen3 / base

Class Name

CivitaiTextToSpeechVllmOmniQwen3Base

Category
Civitai/Audio/qwen3
Author
civitai (Account age: 1322days)
Extension
civitai-comfy-nodes
Latest Updated
2026-06-18
Github Stars
0.02K

How to Install civitai-comfy-nodes

Install this extension via the ComfyUI Manager by searching for civitai-comfy-nodes
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter civitai-comfy-nodes in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

qwen3 / base Description

Convert text to speech with vllm-omni engine for high-quality audio outputs in Civitai Orchestration suite.

qwen3 / base:

CivitaiTextToSpeechVllmOmniQwen3Base is a powerful node designed to convert text into speech using the advanced capabilities of the vllm-omni engine within the qwen3 ecosystem. This node is part of the Civitai Orchestration suite, which focuses on providing high-quality audio outputs from textual inputs. It is particularly beneficial for AI artists and developers who need to generate realistic and expressive speech from written content. The node's primary function is to transform text into audio, making it an essential tool for applications that require voice synthesis, such as virtual assistants, audiobooks, and interactive media. By leveraging the sophisticated algorithms of the vllm-omni engine, this node ensures that the generated speech is not only clear and natural but also customizable to suit various needs and preferences.

qwen3 / base Input Parameters:

text

The text parameter is the core input for this node, representing the written content you wish to convert into speech. It directly influences the audio output, as the node will synthesize speech based on the text provided. There are no specific minimum or maximum values for this parameter, but the length of the text may affect processing time and the resulting audio's duration.

language

The language parameter specifies the language in which the text is written. This is crucial for ensuring that the speech synthesis engine correctly interprets and pronounces the text. The choice of language can significantly impact the accuracy and naturalness of the generated speech. While specific language options are not detailed, it is important to select the appropriate language for your text to achieve the best results.

max_new_tokens

The max_new_tokens parameter determines the maximum number of tokens (or words) that the node will process from the input text. This parameter helps manage the length of the generated speech, ensuring that it remains within a manageable and desired range. Adjusting this value can help optimize performance, especially when dealing with longer texts.

ref_audio_url

The ref_audio_url parameter allows you to provide a reference audio URL, which the node can use to match the style or tone of the generated speech. This can be particularly useful if you want the synthesized voice to mimic a specific speaker or audio sample. The URL should point to an accessible audio file that the node can analyze.

ref_text

The ref_text parameter serves as a reference text that can guide the speech synthesis process. By providing a sample text, you can influence the style or emphasis of the generated speech, ensuring it aligns with your desired output. This parameter is optional but can enhance the customization of the speech synthesis.

x_vector_only_mode

The x_vector_only_mode parameter is a specialized setting that, when enabled, focuses the node on generating speech using only x-vectors. This mode can be useful for specific applications where you want to emphasize certain vocal characteristics or styles. The default setting is typically disabled, allowing for a broader range of synthesis options.

qwen3 / base Output Parameters:

audio_blob

The audio_blob output is the primary result of the node, containing the synthesized speech in audio format. This output is crucial for any application that requires audio playback, as it represents the final product of the text-to-speech conversion process.

model_type

The model_type output provides information about the type of model used for the speech synthesis. This can be useful for understanding the characteristics and capabilities of the generated speech, especially if you are comparing outputs from different models.

speaker

The speaker output indicates the voice or speaker profile used in the synthesis process. This information can be important if you are using multiple speaker profiles or need to ensure consistency across different audio outputs.

workflow_id

The workflow_id output is a unique identifier for the specific text-to-speech conversion process. This can be helpful for tracking and managing multiple synthesis tasks, especially in complex workflows or batch processing scenarios.

raw_json

The raw_json output provides a detailed JSON representation of the synthesis process, including metadata and configuration details. This output is valuable for debugging, analysis, and record-keeping, as it offers insights into the node's operation and settings.

qwen3 / base Usage Tips:

  • Ensure that the language parameter matches the language of your input text to achieve the most accurate and natural speech synthesis.
  • Use the ref_audio_url and ref_text parameters to customize the style and tone of the generated speech, especially if you have specific requirements for the voice output.
  • Adjust the max_new_tokens parameter to control the length of the generated speech, which can help manage processing time and ensure the output meets your needs.

qwen3 / base Common Errors and Solutions:

Invalid audio URL

  • Explanation: The ref_audio_url provided is not accessible or does not point to a valid audio file.
  • Solution: Verify that the URL is correct and points to a publicly accessible audio file. Ensure that the file format is supported by the node.

Language not supported

  • Explanation: The specified language is not supported by the speech synthesis engine.
  • Solution: Check the list of supported languages and select an appropriate one for your text. If necessary, adjust the text to match a supported language.

Exceeded token limit

  • Explanation: The input text exceeds the maximum number of tokens allowed by the max_new_tokens parameter.
  • Solution: Reduce the length of the input text or increase the max_new_tokens value to accommodate longer texts.

qwen3 / base Related Nodes

Go back to the extension to check out more related nodes.
civitai-comfy-nodes
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

qwen3 / base