RunComfy

Wan 2.2 Animate V2 | Realistic Pose Video Generator

Transforms photos into smooth-motion animated character videos using Wan 2.2.

Flux Fill | Inpaint and Outpaint

Official Flux Tools - Flux Fill for Inpainting and Outpainting

ComfyUI F5 TTS | Natural Voice Cloning Engine

Turn text into rich, expressive voices with natural tone control.

FLUX.2 Klein Unified Image Editing | Smart Inpaint, Outpaint & Remove

Flawless editing. Remove, fill, and extend any image fast.

ComfyUI > Nodes > VibeVoice ComfyUI > VibeVoice Single Speaker

ComfyUI Node: VibeVoice Single Speaker

Class Name

VibeVoiceSingleSpeakerNode

Category
VibeVoiceWrapper

Author
Fabio Sarracino (Account age: 110days) Extension
VibeVoice ComfyUI Latest Updated
2025-10-02 Github Stars
1.25K

Github Ask Fabio Sarracino Current Questions Past Questions

Table of Content

Description
VibeVoiceSingleSpeakerNode:
VibeVoiceSingleSpeakerNode Input Parameters:
VibeVoiceSingleSpeakerNode Output Parameters:
VibeVoiceSingleSpeakerNode Usage Tips:
VibeVoiceSingleSpeakerNode Common Errors and Solutions:
Related Nodes

How to Install VibeVoice ComfyUI

Install this extension via the ComfyUI Manager by searching for VibeVoice ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter VibeVoice ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

VibeVoice Single Speaker Description

Transform text into speech for single speaker scenarios using VibeVoice technology in a seamless and efficient manner.

VibeVoice Single Speaker:

The VibeVoiceSingleSpeakerNode is designed to transform text into speech using the VibeVoice technology, specifically tailored for scenarios involving a single speaker. This node is part of a larger system that leverages advanced voice synthesis techniques to generate high-quality audio from textual input. The primary goal of this node is to provide a seamless and efficient way to convert written content into spoken words, making it ideal for applications such as voiceovers, audiobooks, and other audio content creation tasks. By focusing on a single speaker, the node simplifies the process, ensuring consistent voice quality and tone throughout the generated audio. The node processes text by parsing pause keywords, formatting it for VibeVoice, and generating audio segments that are then combined to produce the final output. This approach allows for precise control over the speech synthesis process, resulting in natural and expressive audio output.

VibeVoice Single Speaker Input Parameters:

model

The model parameter specifies the voice synthesis model to be used for generating speech. It determines the characteristics and quality of the voice output. The choice of model can significantly impact the naturalness and expressiveness of the generated audio. There are no specific minimum or maximum values, but selecting a model that aligns with your desired voice characteristics is crucial for optimal results.

model_path

The model_path parameter indicates the file path where the voice synthesis model is stored. This path is essential for loading the correct model into the system. Ensuring the path is accurate and accessible is vital for the node to function correctly. There are no specific constraints on the path format, but it should be a valid file path on your system.

attention_type

The attention_type parameter defines the type of attention mechanism used in the voice synthesis process. Attention mechanisms help the model focus on different parts of the input text, improving the quality and coherence of the generated speech. The choice of attention type can affect the clarity and fluidity of the audio output. There are no specific options provided, but selecting an appropriate attention type is important for achieving the desired speech characteristics.

quantize_llm

The quantize_llm parameter is a boolean flag that indicates whether to apply quantization to the language model. Quantization can reduce the model size and improve processing efficiency, but it may also affect the quality of the generated speech. The default value is typically False, meaning no quantization is applied unless specified otherwise.

lora_path

The lora_path parameter specifies the path to the LoRA (Low-Rank Adaptation) model, which can be used to fine-tune the voice synthesis process. This parameter is optional and is used when additional customization of the voice output is required. Providing a valid path to a LoRA model can enhance the expressiveness and adaptability of the generated speech.

voice_to_clone

The voice_to_clone parameter allows you to specify a reference voice that the system will attempt to mimic. This parameter is crucial for applications where a specific voice style or tone is desired. The system will use this reference to guide the synthesis process, aiming to produce audio that closely resembles the chosen voice.

voice_speed_factor

The voice_speed_factor parameter controls the speed of the generated speech. It allows you to adjust the tempo of the voice output, making it faster or slower according to your needs. The default value is typically 1.0, representing normal speed, with values greater than 1.0 increasing the speed and values less than 1.0 decreasing it.

cfg_scale

The cfg_scale parameter influences the configuration scale of the synthesis process, affecting the balance between creativity and adherence to the input text. A higher value encourages more creative and varied outputs, while a lower value results in more literal and precise speech generation. The default value is usually set to provide a balanced output.

seed

The seed parameter is used to initialize the random number generator, ensuring reproducibility of the generated audio. By setting a specific seed value, you can produce consistent results across multiple runs. This parameter is particularly useful for debugging and fine-tuning the synthesis process.

diffusion_steps

The diffusion_steps parameter determines the number of steps in the diffusion process, which is part of the voice synthesis algorithm. More steps can lead to higher quality audio but may also increase processing time. The default value is typically set to balance quality and efficiency.

use_sampling

The use_sampling parameter is a boolean flag that indicates whether to use sampling techniques during the synthesis process. Sampling can introduce variability and creativity into the generated speech, but it may also affect consistency. The default value is usually False, meaning no sampling is applied unless specified otherwise.

temperature

The temperature parameter controls the randomness of the voice synthesis process. A higher temperature value results in more varied and creative outputs, while a lower value produces more deterministic and consistent speech. The default value is typically set to provide a balance between creativity and stability.

top_p

The top_p parameter, also known as nucleus sampling, determines the cumulative probability threshold for selecting the next word in the synthesis process. It helps control the diversity of the generated speech, with lower values producing more focused and coherent outputs. The default value is usually set to ensure a good balance between diversity and coherence.

llm_lora_strength

The llm_lora_strength parameter specifies the strength of the LoRA model adaptation, affecting how much influence the LoRA model has on the final output. A higher value increases the impact of the LoRA model, allowing for more customization and expressiveness in the generated speech. The default value is typically set to provide a moderate level of adaptation.

VibeVoice Single Speaker Output Parameters:

all_audio_segments

The all_audio_segments output parameter is a list of audio segments generated from the input text. Each segment represents a portion of the text converted into speech, and together they form the complete audio output. This parameter is crucial for applications that require precise control over the timing and structure of the generated speech, allowing you to manipulate and combine segments as needed.

VibeVoice Single Speaker Usage Tips:

To achieve the best results, carefully select the model and voice_to_clone parameters to match the desired voice characteristics and style.
Adjust the voice_speed_factor and temperature parameters to fine-tune the expressiveness and tempo of the generated speech, ensuring it aligns with your specific application needs.
Use the seed parameter to ensure consistent and reproducible results, especially when fine-tuning the synthesis process for specific projects.

VibeVoice Single Speaker Common Errors and Solutions:

ModelNotFoundError

Explanation: This error occurs when the specified model_path does not point to a valid or accessible model file.
Solution: Verify that the model_path is correct and that the model file exists at the specified location. Ensure that the file permissions allow for reading.

InvalidParameterError

Explanation: This error is raised when one or more input parameters are set to invalid values or types.
Solution: Double-check all input parameters to ensure they are within the expected ranges and of the correct types. Refer to the parameter descriptions for guidance.

AudioGenerationFailure

Explanation: This error indicates a failure in the audio generation process, possibly due to incorrect model configuration or incompatible parameter settings.
Solution: Review the model and parameter settings to ensure they are compatible and correctly configured. Consider adjusting parameters such as cfg_scale, diffusion_steps, and temperature to resolve the issue.

VibeVoice Single Speaker Related Nodes

Go back to the extension to check out more related nodes.

VibeVoice ComfyUI

Table of Content

Description
VibeVoiceSingleSpeakerNode:
VibeVoiceSingleSpeakerNode Input Parameters:
VibeVoiceSingleSpeakerNode Output Parameters:
VibeVoiceSingleSpeakerNode Usage Tips:
VibeVoiceSingleSpeakerNode Common Errors and Solutions:
Related Nodes

LTX-2 ControlNet | Precision Video Generator

Sharp control, perfect sync, super clear AI video creation.

Wan 2.2 Video Restyle | First Frame Restyle for Consistent and Cinematic Video Generation

Change the first frame, folks, your style makes the whole video look amazing. Pure magic.

Wan Alpha | Transparent Video Generator

Alpha magic: instant transparent background videos for VFX and design.

SAM 3 | Advanced Object Segmentation Tool

Next-gen segmentation tool for precise object masking and tracking.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy