ComfyUI > Nodes > Step Audio EditX TTS

ComfyUI Extension: Step Audio EditX TTS

Repo Name

ComfyUI-Step_Audio_EditX_TTS

Author
saganaki22 (Account age: 1683 days)
Nodes
View all nodes(2)
Latest Updated
2025-12-04
Github Stars
0.05K

How to Install Step Audio EditX TTS

Install this extension via the ComfyUI Manager by searching for Step Audio EditX TTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter Step Audio EditX TTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Step Audio EditX TTS Description

Step Audio EditX TTS is a professional voice cloning and audio editing node for ComfyUI, enabling advanced audio manipulation and text-to-speech functionalities.

ComfyUI-Step_Audio_EditX_TTS Introduction

ComfyUI-Step_Audio_EditX_TTS is an innovative extension designed to enhance your audio editing capabilities within the ComfyUI framework. This extension allows you to perform state-of-the-art zero-shot voice cloning and advanced audio editing with ease. Whether you're an AI artist looking to create unique voiceovers for your projects or a developer seeking to integrate sophisticated audio manipulation into your applications, this extension offers a comprehensive suite of tools to meet your needs. With features like emotion and style editing, speed control, and paralinguistic effects, you can transform and customize audio content to suit any creative vision.

How ComfyUI-Step_Audio_EditX_TTS Works

At its core, ComfyUI-Step_Audio_EditX_TTS leverages advanced machine learning models to analyze and manipulate audio data. The extension uses a modular workflow design, allowing you to separate the processes of voice cloning and audio editing. By providing a short reference audio clip, the extension can clone the voice and generate new speech in that voice with any text you provide. The editing capabilities enable you to adjust the emotional tone, speaking style, and speed of the audio, as well as add effects like laughter or breathing. This is achieved through a series of nodes within the ComfyUI interface, which you can connect and configure to create complex audio workflows without needing to write code.

ComfyUI-Step_Audio_EditX_TTS Features

  • Zero-Shot Voice Cloning: Clone any voice using just a 3-30 second audio sample. This feature is perfect for creating consistent character voices across different projects.
  • Advanced Audio Editing: Modify the emotion, style, and speed of audio clips. Add paralinguistic effects such as laughter or sighs, and remove background noise with denoising tools.
  • Native ComfyUI Integration: Seamlessly integrates with ComfyUI, allowing you to use its powerful node-based interface for audio processing.
  • Modular Workflow Design: Separate nodes for cloning and editing enable flexible and customizable audio workflows.
  • Longform Support: Smart chunking allows for the processing of long texts, automatically splitting and stitching audio seamlessly.
  • Iterative Editing: Apply multiple iterations of edits to achieve stronger and more pronounced effects.

ComfyUI-Step_Audio_EditX_TTS Models

The extension utilizes two main models: the Step-Audio-EditX model and the Step-Audio-Tokenizer. The Step-Audio-EditX model is responsible for the core audio processing tasks, while the Step-Audio-Tokenizer helps in managing and processing the audio data efficiently. These models work together to provide high-quality audio cloning and editing capabilities.

Troubleshooting ComfyUI-Step_Audio_EditX_TTS

Common Issues and Solutions

  • Garbled or Distorted Speech: Ensure that all dependencies are up to date. You can update the transformers library to version 4.53.3 and verify that librosa and hyperpyyaml are installed.
  • Out of Memory Errors: Try enabling quantization or reducing the max_new_tokens parameter. You can also disable the keep_model_in_vram option to free up VRAM.
  • Poor Voice Quality: Make sure the prompt_text matches the reference audio transcript exactly. Use high-quality reference audio and consider increasing the temperature setting for more natural variation.
  • Edit Node Not Working: Check that the audio length is between 0.5-30 seconds. Ensure the audio_text matches the input audio transcript and that the correct edit type is selected.

Learn More about ComfyUI-Step_Audio_EditX_TTS

For further learning and support, you can explore the following resources:

Step Audio EditX TTS Related Nodes

RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.