RunComfy

Wan 2.2 Animate V2 | Realistic Pose Video Generator

Transforms photos into smooth-motion animated character videos using Wan 2.2.

SAM 3 | Advanced Object Segmentation Tool

Next-gen segmentation tool for precise object masking and tracking.

LTX-2 First Last Frame | Key Frames Video Generator

Turn still frames into seamless video and sound transitions fast.

FLUX.2 Klein Unified Image Editing | Smart Inpaint, Outpaint & Remove

Flawless editing. Remove, fill, and extend any image fast.

ComfyUI > Nodes > TTS Audio Suite > ⚙️ VibeVoice Engine

ComfyUI Node: ⚙️ VibeVoice Engine

Class Name

VibeVoiceEngineNode

Category
TTS Audio Suite/⚙️ Engines

Author
diogod (Account age: 667days) Extension
TTS Audio Suite Latest Updated
2025-12-13 Github Stars
0.46K

Github Ask diogod Current Questions Past Questions

Table of Content

Description
VibeVoiceEngineNode:
VibeVoiceEngineNode Input Parameters:
VibeVoiceEngineNode Output Parameters:
VibeVoiceEngineNode Usage Tips:
VibeVoiceEngineNode Common Errors and Solutions:
Related Nodes

How to Install TTS Audio Suite

Install this extension via the ComfyUI Manager by searching for TTS Audio Suite

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter TTS Audio Suite in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

⚙️ VibeVoice Engine Description

Specialized node integrating VibeVoice engine for TTS tasks with user-friendly configuration and advanced flexibility.

⚙️ VibeVoice Engine:

The VibeVoiceEngineNode is a specialized configuration node designed to integrate the VibeVoice engine into a unified node system. Its primary purpose is to provide VibeVoice-specific parameters and create an engine adapter that facilitates seamless interaction with other nodes. This node is essential for users who want to leverage the capabilities of the VibeVoice engine within a broader text-to-speech (TTS) framework. By offering a structured approach to configuring the VibeVoice engine, it ensures that users can easily set up and manage their TTS tasks with precision and efficiency. The node's design emphasizes user-friendliness, making it accessible even to those without a deep technical background, while still offering the flexibility needed for advanced configurations.

⚙️ VibeVoice Engine Input Parameters:

temperature

The temperature parameter controls the randomness of the text-to-speech output. A lower value results in more deterministic and predictable speech, while a higher value introduces more variability and creativity in the output. This parameter is crucial for adjusting the expressiveness of the generated speech, allowing you to tailor the output to suit different contexts or preferences. The typical range for this parameter is from 0.0 to 1.0, with a default value often set around 0.7 for a balanced output.

top_p

The top_p parameter, also known as nucleus sampling, determines the cumulative probability threshold for selecting the next word in the speech output. By setting this parameter, you can control the diversity of the generated speech. A lower value restricts the selection to the most probable words, resulting in more conservative speech, while a higher value allows for more diverse and creative outputs. The range for this parameter is usually between 0.0 and 1.0, with a default value around 0.9.

chunk_minutes

The chunk_minutes parameter specifies the length of the text chunk to be processed by the VibeVoice engine, measured in minutes. This parameter is used to convert the duration into an approximate number of characters, based on an average reading speed. It helps in managing the processing load and ensuring that the text is divided into manageable segments for efficient TTS conversion. The typical value for this parameter is around 1 minute, which corresponds to approximately 750 characters.

max_new_tokens

The max_new_tokens parameter defines the maximum number of tokens that can be generated in the speech output. This parameter is essential for controlling the length of the generated speech, ensuring that it does not exceed a specified limit. It is particularly useful for applications where concise and focused speech output is required. The value for this parameter can vary depending on the specific requirements of the task.

speaker2_voice

The speaker2_voice parameter allows you to specify an alternative voice for the second speaker in a multi-speaker setup. This parameter is optional and can be used to add variety and distinction between different speakers in the generated speech. By selecting different voices for each speaker, you can create more dynamic and engaging TTS outputs.

speaker3_voice

Similar to speaker2_voice, the speaker3_voice parameter is used to assign a unique voice to the third speaker in a multi-speaker configuration. This optional parameter enhances the versatility of the TTS output by allowing for multiple distinct voices, which can be particularly useful in dialogues or multi-character narratives.

speaker4_voice

The speaker4_voice parameter provides the option to assign a specific voice to the fourth speaker in a multi-speaker setup. Like the other speaker voice parameters, it is optional and serves to enrich the TTS output by enabling the use of diverse voices, thereby enhancing the overall listening experience.

⚙️ VibeVoice Engine Output Parameters:

engine_adapter

The engine_adapter output parameter represents the configured adapter for the VibeVoice engine. This adapter is crucial for integrating the VibeVoice engine into the broader TTS framework, allowing it to interact seamlessly with other nodes. The engine_adapter ensures that the VibeVoice engine's capabilities are fully utilized, providing a smooth and efficient TTS conversion process. It acts as the bridge between the engine's specific functionalities and the unified node system, enabling users to achieve their desired TTS outcomes with ease.

⚙️ VibeVoice Engine Usage Tips:

Adjust the temperature parameter to control the expressiveness of the speech output. Lower values result in more predictable speech, while higher values introduce more variability.
Use the top_p parameter to manage the diversity of the generated speech. A lower value restricts the selection to the most probable words, while a higher value allows for more creative outputs.
Set the chunk_minutes parameter to ensure that the text is divided into manageable segments for efficient processing by the VibeVoice engine.
Specify different voices for speaker2_voice, speaker3_voice, and speaker4_voice to create dynamic and engaging multi-speaker TTS outputs.

⚙️ VibeVoice Engine Common Errors and Solutions:

"Invalid temperature value"

Explanation: The temperature parameter value is outside the acceptable range.
Solution: Ensure that the temperature value is between 0.0 and 1.0.

"Invalid top_p value"

Explanation: The top_p parameter value is not within the valid range.
Solution: Adjust the top_p value to be between 0.0 and 1.0.

"Chunk size too large"

Explanation: The chunk_minutes parameter results in a text chunk that is too large for processing.
Solution: Reduce the chunk_minutes value to create smaller, more manageable text segments.

"Max new tokens exceeded"

Explanation: The generated speech exceeds the specified max_new_tokens limit.
Solution: Increase the max_new_tokens value or reduce the input text length to fit within the limit.

⚙️ VibeVoice Engine Related Nodes

Go back to the extension to check out more related nodes.

TTS Audio Suite

Table of Content

Description
VibeVoiceEngineNode:
VibeVoiceEngineNode Input Parameters:
VibeVoiceEngineNode Output Parameters:
VibeVoiceEngineNode Usage Tips:
VibeVoiceEngineNode Common Errors and Solutions:
Related Nodes

Wan 2.2 VACE | Pose-Controlled Video Generator

Turn still images into stunning motion with pose-based control.

InstantCharacter

One photo, endless characters. Perfect identity preservation.

AP Workflow 12.0 | Ready-to-Use Complete AI Media Suite

Pre-set all-in-one system for image & video generation, enhancement, and manipulation. Zero setup required.

ComfyUI F5 TTS | Natural Voice Cloning Engine

Turn text into rich, expressive voices with natural tone control.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy