RunComfy

FlashVSR | Real-Time Video Upscaler

Upscale videos fast, smooth, and super clear—no detail lost.

Image Bypass | Smart Image Detection Bypass Utility Workflow

Skip limits and process images faster with total creative control.

Consistent Character Creator 3.0 | Easy Consistency, Any Angle

Make characters stay the same, every angle, strong and perfect.

MatAnyone Video Matting | Single Mask Removal

Remove video backgrounds with one mask frame for perfect subject isolation.

ComfyUI > Nodes > ComfyUI-FL-VoxCPM > FL VoxCPM V2 TTS

ComfyUI Node: FL VoxCPM V2 TTS

Class Name

FL_VoxCPM_V2_TTS

Category
FL/VoxCPM

Author
filliptm (Account age: 2446days) Extension
ComfyUI-FL-VoxCPM Latest Updated
2026-05-21 Github Stars
0.03K

Github Ask filliptm Current Questions Past Questions

Table of Content

Description
FL_VoxCPM_V2_TTS:
FL_VoxCPM_V2_TTS Input Parameters:
FL_VoxCPM_V2_TTS Output Parameters:
FL_VoxCPM_V2_TTS Usage Tips:
FL_VoxCPM_V2_TTS Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-FL-VoxCPM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FL-VoxCPM

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-FL-VoxCPM in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

FL VoxCPM V2 TTS Description

Sophisticated text-to-speech node with advanced voice cloning features for high-quality speech synthesis.

FL VoxCPM V2 TTS:

FL VoxCPM V2 TTS is a sophisticated text-to-speech node designed to generate high-quality speech using the VoxCPM V2 model. This node is equipped with advanced features such as Voice Design, Voice Cloning, Controllable Cloning, and Ultimate Cloning modes, allowing you to create highly expressive and personalized speech outputs. The node is particularly beneficial for AI artists and developers who wish to incorporate realistic and customizable voice synthesis into their projects. By leveraging the capabilities of VoxCPM V2, this node provides a versatile platform for generating speech that can be tailored to specific needs, whether it's for creating unique character voices or replicating existing ones with precision.

FL VoxCPM V2 TTS Input Parameters:

model_name

This parameter allows you to select the specific VoxCPM model to use for speech generation. It is crucial for determining the characteristics and capabilities of the generated speech. The available options are defined by the models supported by the node, and selecting the appropriate model can significantly impact the quality and style of the output.

text

The text parameter is where you input the script or content you wish to convert into speech. It supports multiline input, meaning each line is processed as a separate chunk, allowing for complex and varied speech synthesis. The default text is "VoxCPM is an innovative TTS model designed to generate highly expressive speech."

prompt_audio

This optional parameter allows you to provide reference audio for voice cloning. By supplying a sample of the desired voice, the node can more accurately replicate the voice characteristics in the generated speech.

prompt_text

The transcript of the reference audio is required for voice cloning. This optional parameter helps the node understand the context and content of the reference audio, ensuring a more accurate voice cloning process.

cfg_value

The guidance scale parameter, with a default value of 2.0, influences how closely the generated speech adheres to the provided prompt. Higher values result in speech that is more faithful to the prompt but may sound less natural. The range is from 1.0 to 10.0.

inference_timesteps

This parameter determines the number of diffusion steps used during speech generation. Higher values can improve the quality of the output but will increase processing time. The default is 10, with a range from 1 to 100.

min_tokens

Specifies the minimum length of generated audio tokens, ensuring that the output meets a certain duration. The default is 2, with a range from 1 to 100.

max_tokens

Defines the maximum length of generated audio tokens, controlling the upper limit of the speech duration. The default is 2048, with a range from 64 to 8192.

FL VoxCPM V2 TTS Output Parameters:

waveform

The waveform output parameter provides the generated audio in a tensor format, representing the synthesized speech. This output is crucial for further processing or playback, as it contains the actual audio data created by the node.

sample_rate

This parameter indicates the sample rate of the generated audio, which is essential for ensuring compatibility with various audio playback systems and maintaining the quality of the output.

FL VoxCPM V2 TTS Usage Tips:

Experiment with different model_name options to find the best fit for your project's voice characteristics.
Use prompt_audio and prompt_text for accurate voice cloning, especially when replicating specific voices.
Adjust cfg_value to balance between naturalness and adherence to the prompt, depending on your needs.
Increase inference_timesteps for higher quality output, but be mindful of the increased processing time.

FL VoxCPM V2 TTS Common Errors and Solutions:

Model 'model_name' not found.

Explanation: This error occurs when the specified model name is not available in the node's supported models.
Solution: Ensure that you select a model name from the available options provided by the node.

'model_name' is a V1 model. Use the FL VoxCPM TTS node instead.

Explanation: This error indicates that a V1 model was selected, which is not compatible with the V2 node.
Solution: Switch to using the FL VoxCPM TTS node for V1 models or select a V2 model for this node.

FL VoxCPM V2 TTS Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-FL-VoxCPM

Table of Content

Description
FL_VoxCPM_V2_TTS:
FL_VoxCPM_V2_TTS Input Parameters:
FL_VoxCPM_V2_TTS Output Parameters:
FL_VoxCPM_V2_TTS Usage Tips:
FL_VoxCPM_V2_TTS Common Errors and Solutions:
Related Nodes

Wan 2.1 FLF2V | First-Last Frame Video

Generate smooth videos from a start and end frame using Wan 2.1 FLF2V.

Stable Audio Open 1.0 | Text-to-Music Tool

Turns text prompts into cinematic music seamlessly and fast.

FLUX Kontext Face Swap | Seamless Face Replacement

Photoreal face replacement with prompt-guided control and natural blending

Z-Image Turbo I2I for Characters | Ultimate Photorealism

Turns portraits into lifelike, perfectly detailed realistic faces fast.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: FL VoxCPM V2 TTS

FL_VoxCPM_V2_TTS

How to Install ComfyUI-FL-VoxCPM

FL VoxCPM V2 TTS Description

FL VoxCPM V2 TTS:

FL VoxCPM V2 TTS Input Parameters:

model_name

text

prompt_audio

prompt_text

cfg_value

inference_timesteps

min_tokens

max_tokens

FL VoxCPM V2 TTS Output Parameters:

waveform

sample_rate

FL VoxCPM V2 TTS Usage Tips:

FL VoxCPM V2 TTS Common Errors and Solutions:

Model 'model_name' not found.

'model_name' is a V1 model. Use the FL VoxCPM TTS node instead.

FL VoxCPM V2 TTS Related Nodes