ComfyUI > Nodes > ComfyUI-VoxCPM

ComfyUI Extension: ComfyUI-VoxCPM

Repo Name

ComfyUI-VoxCPM

Author
wildminder (Account age: 4772 days)
Nodes
View all nodes(1)
Latest Updated
2025-12-15
Github Stars
0.34K

How to Install ComfyUI-VoxCPM

Install this extension via the ComfyUI Manager by searching for ComfyUI-VoxCPM
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-VoxCPM in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI-VoxCPM Description

ComfyUI-VoxCPM enables context-aware, expressive speech generation and authentic voice cloning, enhancing text-to-speech capabilities with lifelike vocal outputs.

ComfyUI-VoxCPM Introduction

ComfyUI-VoxCPM is an innovative extension designed to enhance the capabilities of ComfyUI by integrating VoxCPM, a cutting-edge Text-to-Speech (TTS) system. This extension allows you to generate highly realistic and expressive speech directly from text, without the need for traditional tokenization methods. It excels in context-aware speech generation and true-to-life voice cloning, making it a powerful tool for AI artists looking to create lifelike audio content. Whether you're aiming to produce expressive narrations or clone a specific voice, ComfyUI-VoxCPM provides the tools to achieve your creative goals.

How ComfyUI-VoxCPM Works

At its core, ComfyUI-VoxCPM leverages the VoxCPM model, which operates on a tokenizer-free architecture. This means it doesn't rely on breaking down speech into discrete tokens. Instead, it models speech in a continuous space, allowing for more fluid and natural-sounding audio. The model uses an end-to-end diffusion autoregressive approach, which means it can generate speech directly from text inputs, capturing the nuances of human speech such as intonation, rhythm, and emotion. This approach is akin to having a conversation where the model understands the context and responds with appropriate vocal expressions.

ComfyUI-VoxCPM Features

  • Context-Aware Expressive Speech: The model can interpret the context of the text to generate speech with suitable prosody and expression, making the audio output sound more natural and engaging.
  • True-to-Life Voice Cloning: By using a short audio sample as a reference, the model can clone the voice's unique characteristics, including timbre, accent, and emotional tone.
  • Zero-Shot TTS: You can generate high-quality speech without needing any reference audio, allowing for quick and easy audio creation.
  • Automatic Model Management: The extension automatically handles the downloading and management of the VoxCPM model, optimizing memory usage to save VRAM.
  • Fine-Grained Control: Users can adjust parameters like Classifier-Free Guidance (CFG) scale and inference steps to fine-tune the style and quality of the generated speech.
  • High-Efficiency Synthesis: Designed for speed, the extension can generate audio quickly, even on consumer-grade hardware.

ComfyUI-VoxCPM Models

The extension currently supports the VoxCPM-0.5B model, which is automatically downloaded and managed by the system. This model is designed to provide a balance between performance and resource efficiency, making it suitable for a wide range of applications.

  • VoxCPM-0.5B: This model is ideal for generating expressive and natural-sounding speech. It is particularly effective for zero-shot voice cloning and context-aware speech generation.

Troubleshooting ComfyUI-VoxCPM

If you encounter issues while using ComfyUI-VoxCPM, here are some common problems and solutions:

  • Audio Quality Issues: Ensure that the prompt_text accurately matches the prompt_audio for voice cloning. This alignment is crucial for achieving high-quality results.
  • Model Download Problems: Check your internet connection and ensure that there is enough disk space for the model files.
  • Performance Issues: If the system is slow, try reducing the inference_timesteps or adjusting the cfg_value to balance quality and speed.

Learn More about ComfyUI-VoxCPM

To further explore the capabilities of ComfyUI-VoxCPM, consider visiting the following resources:

  • VoxCPM GitHub Repository: Explore the technical details and updates about the VoxCPM model.
  • Hugging Face Model Page: Access the model and related resources.
  • VoxCPM Demo Page: Listen to audio samples and see the model in action. These resources provide valuable insights and support for AI artists looking to harness the full potential of ComfyUI-VoxCPM in their creative projects.

ComfyUI-VoxCPM Related Nodes

RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.