ComfyUI > Nodes > ComfyUI_RH_VoxCPM

ComfyUI Extension: ComfyUI_RH_VoxCPM

Repo Name

ComfyUI_RH_VoxCPM

Author
HM-RunningHub (Account age: 489 days)
Nodes
View all nodes(4)
Latest Updated
2026-04-15
Github Stars
0.03K

How to Install ComfyUI_RH_VoxCPM

Install this extension via the ComfyUI Manager by searching for ComfyUI_RH_VoxCPM
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_RH_VoxCPM in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI_RH_VoxCPM Description

ComfyUI_RH_VoxCPM is an extension for ComfyUI that enhances user interface capabilities by integrating advanced voice control features. It allows users to interact with the UI using voice commands, improving accessibility and efficiency.

ComfyUI_RH_VoxCPM Introduction

ComfyUI_RH_VoxCPM is an innovative extension designed to enhance the capabilities of ComfyUI by integrating the VoxCPM system. This extension allows you to generate high-quality, context-aware speech without the need for a tokenizer. It supports creative voice design and high-fidelity voice cloning, making it a powerful tool for AI artists interested in exploring new dimensions of audio creation. Whether you're looking to design unique voices based on textual descriptions or clone existing voices with precision, ComfyUI_RH_VoxCPM offers a versatile solution.

How ComfyUI_RH_VoxCPM Works

At its core, ComfyUI_RH_VoxCPM leverages the VoxCPM system, which is a tokenizer-free text-to-speech (TTS) technology. This means it can generate speech directly from text without converting the audio into discrete tokens. The system uses a diffusion autoregressive architecture to produce continuous audio representations, resulting in natural and expressive speech synthesis. Imagine it as a painter who creates a masterpiece directly on canvas without sketching first; similarly, VoxCPM crafts audio directly from text, ensuring fluidity and expressiveness.

ComfyUI_RH_VoxCPM Features

  • Voice Design: Create entirely new voices by describing characteristics such as gender, age, tone, emotion, and speed. This feature allows you to bring your creative visions to life by simply using descriptive text.
  • Controllable Cloning: Upload a reference audio to clone its voice characteristics while using text instructions to control style, emotion, and speed. This feature is perfect for artists who want to maintain the essence of a voice while adding their unique touch.
  • Ultimate Cloning: For those who need to replicate every detail of a voice, this mode allows the model to continue from a reference audio, capturing every nuance. This is ideal for high-fidelity voice cloning projects.
  • LoRA Fine-Tuning: Customize voice generation by loading your own LoRA weights, enabling personalized voice synthesis.
  • Automatic Speech Recognition (ASR): If the reference audio text is empty, the system automatically uses FunASR SenseVoiceSmall to recognize the speech.
  • Reference Audio Denoising: Optionally use ZipEnhancer to reduce noise in reference audio, ensuring cleaner input for cloning.

ComfyUI_RH_VoxCPM Models

ComfyUI_RH_VoxCPM supports several models, each catering to different needs:

  • VoxCPM2: With 2 billion parameters, this model offers the best quality and is recommended for projects requiring the highest fidelity.
  • VoxCPM1.5: A balanced choice with 800 million parameters, suitable for general use.
  • VoxCPM-0.5B: A lightweight model with 640 million parameters, ideal for projects where resource efficiency is a priority. Each model can significantly impact the quality and performance of the generated audio, so choose based on your project's requirements.

What's New with ComfyUI_RH_VoxCPM

The latest updates to ComfyUI_RH_VoxCPM include enhanced voice cloning capabilities and improved support for multi-speaker dialogues. These updates allow for more dynamic and expressive audio generation, providing AI artists with greater creative freedom and control over their projects.

Troubleshooting ComfyUI_RH_VoxCPM

If you encounter issues while using ComfyUI_RH_VoxCPM, here are some common solutions:

  • Problem: The generated audio does not match the expected style or emotion.
  • Solution: Double-check your text instructions for clarity and specificity. Ensure that the control instructions are correctly formatted and relevant to the desired output.
  • Problem: Reference audio is noisy or unclear.
  • Solution: Enable the denoise option using ZipEnhancer to clean up the reference audio before processing.
  • Problem: The system fails to recognize the reference audio text.
  • Solution: Ensure that the automatic ASR feature is enabled, or manually provide the text transcription if possible.

Learn More about ComfyUI_RH_VoxCPM

To further explore the capabilities of ComfyUI_RH_VoxCPM, consider visiting the following resources:

  • VoxCPM GitHub Repository for technical details and updates.
  • VoxCPM2 on HuggingFace for model downloads and community discussions.
  • RunningHub (https://www.runninghub.cn) for online usage and additional support. These resources provide valuable insights and community support, helping you make the most of ComfyUI_RH_VoxCPM in your creative projects.

ComfyUI_RH_VoxCPM Related Nodes

RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

ComfyUI_RH_VoxCPM detailed guide | ComfyUI