RunComfy

Wan 2.2 + Lightx2v V2 | Ultra Fast I2V & T2V

Dual Light LoRA setup, 4X faster.

Z Image Turbo | Ultra-Fast Photorealistic Generator

Generate ultra-clear visuals fast with unmatched real-time detail.

Wan 2.1 | Revolutionary Video Generation

Create incredible videos from text or images with breakthrough AI running on everyday CPUs.

ACE++ Character Consistency

Generate consistent images of your character across poses, angles, and styles from a single photo.

ComfyUI > Nodes > ComfyUI-VoxCPM

ComfyUI Extension: ComfyUI-VoxCPM

Repo Name

ComfyUI-VoxCPM

Author
wildminder (Account age: 4772 days) Nodes
View all nodes(1) Latest Updated
2025-12-15 Github Stars
0.34K

Github Ask wildminder Current Questions Past Questions

Table of Content

Description
ComfyUI-VoxCPM Introduction
How ComfyUI-VoxCPM Works
ComfyUI-VoxCPM Features
ComfyUI-VoxCPM Models
Troubleshooting ComfyUI-VoxCPM
Learn More about ComfyUI-VoxCPM
Related Nodes

How to Install ComfyUI-VoxCPM

Install this extension via the ComfyUI Manager by searching for ComfyUI-VoxCPM

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-VoxCPM in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI-VoxCPM Description

ComfyUI-VoxCPM enables context-aware, expressive speech generation and authentic voice cloning, enhancing text-to-speech capabilities with lifelike vocal outputs.

ComfyUI-VoxCPM Introduction

ComfyUI-VoxCPM is an innovative extension designed to enhance the capabilities of ComfyUI by integrating VoxCPM, a cutting-edge Text-to-Speech (TTS) system. This extension allows you to generate highly realistic and expressive speech directly from text, without the need for traditional tokenization methods. It excels in context-aware speech generation and true-to-life voice cloning, making it a powerful tool for AI artists looking to create lifelike audio content. Whether you're aiming to produce expressive narrations or clone a specific voice, ComfyUI-VoxCPM provides the tools to achieve your creative goals.

How ComfyUI-VoxCPM Works

At its core, ComfyUI-VoxCPM leverages the VoxCPM model, which operates on a tokenizer-free architecture. This means it doesn't rely on breaking down speech into discrete tokens. Instead, it models speech in a continuous space, allowing for more fluid and natural-sounding audio. The model uses an end-to-end diffusion autoregressive approach, which means it can generate speech directly from text inputs, capturing the nuances of human speech such as intonation, rhythm, and emotion. This approach is akin to having a conversation where the model understands the context and responds with appropriate vocal expressions.

ComfyUI-VoxCPM Features

Context-Aware Expressive Speech: The model can interpret the context of the text to generate speech with suitable prosody and expression, making the audio output sound more natural and engaging.
True-to-Life Voice Cloning: By using a short audio sample as a reference, the model can clone the voice's unique characteristics, including timbre, accent, and emotional tone.
Zero-Shot TTS: You can generate high-quality speech without needing any reference audio, allowing for quick and easy audio creation.
Automatic Model Management: The extension automatically handles the downloading and management of the VoxCPM model, optimizing memory usage to save VRAM.
Fine-Grained Control: Users can adjust parameters like Classifier-Free Guidance (CFG) scale and inference steps to fine-tune the style and quality of the generated speech.
High-Efficiency Synthesis: Designed for speed, the extension can generate audio quickly, even on consumer-grade hardware.

ComfyUI-VoxCPM Models

The extension currently supports the VoxCPM-0.5B model, which is automatically downloaded and managed by the system. This model is designed to provide a balance between performance and resource efficiency, making it suitable for a wide range of applications.

VoxCPM-0.5B: This model is ideal for generating expressive and natural-sounding speech. It is particularly effective for zero-shot voice cloning and context-aware speech generation.

Troubleshooting ComfyUI-VoxCPM

If you encounter issues while using ComfyUI-VoxCPM, here are some common problems and solutions:

Audio Quality Issues: Ensure that the prompt_text accurately matches the prompt_audio for voice cloning. This alignment is crucial for achieving high-quality results.
Model Download Problems: Check your internet connection and ensure that there is enough disk space for the model files.
Performance Issues: If the system is slow, try reducing the inference_timesteps or adjusting the cfg_value to balance quality and speed.

Learn More about ComfyUI-VoxCPM

To further explore the capabilities of ComfyUI-VoxCPM, consider visiting the following resources:

VoxCPM GitHub Repository: Explore the technical details and updates about the VoxCPM model.
Hugging Face Model Page: Access the model and related resources.
VoxCPM Demo Page: Listen to audio samples and see the model in action. These resources provide valuable insights and support for AI artists looking to harness the full potential of ComfyUI-VoxCPM in their creative projects.

ComfyUI-VoxCPM Related Nodes

VoxCPM TTS

Table of Content

Description
ComfyUI-VoxCPM Introduction
How ComfyUI-VoxCPM Works
ComfyUI-VoxCPM Features
ComfyUI-VoxCPM Models
Troubleshooting ComfyUI-VoxCPM
Learn More about ComfyUI-VoxCPM
Related Nodes

ACE++ Face Swap ｜ Image Editing

Swap faces in images with natural language instructions while preserving style and context.

LongCat Avatar in ComfyUI | Identity-Consistent Avatar Animation

Turns one image into smooth, identity-consistent avatar animation.

Stable Diffusion 1.5 LoRA Inference | AI Toolkit ComfyUI

Run AI Toolkit-trained Stable Diffusion 1.5 LoRAs in ComfyUI with training-matched behavior using a single RCSD15 custom node.

Flux 2 Dev | Photoreal Text-to-Image Generator

Next-level image realism with advanced generation control power

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Extension: ComfyUI-VoxCPM

ComfyUI-VoxCPM

How to Install ComfyUI-VoxCPM

ComfyUI-VoxCPM Description

ComfyUI-VoxCPM Introduction

How ComfyUI-VoxCPM Works

ComfyUI-VoxCPM Features

ComfyUI-VoxCPM Models

Troubleshooting ComfyUI-VoxCPM

Learn More about ComfyUI-VoxCPM

ComfyUI-VoxCPM Related Nodes