RunComfy

Qwen Image Edit 2511 | Smart Image Edit Workflow

Edits your image exactly how you tell it to—fast and precise.

Hallo2 | Lip-Sync Portrait Animation

Audio-driven lip-sync for portrait animation in 4K.

Z Image Turbo | Ultra-Fast Photorealistic Generator

Generate ultra-clear visuals fast with unmatched real-time detail.

Animatediff V2 & V3 | Text to Video

Explore AnimateDiff V3, AnimateDiff SDXL and AnimateDiff V2, and use Upscale for high-resolution results.

ComfyUI > Nodes > ComfyUI-kaola-ace-step > ACE-Step Text to Music

ComfyUI Node: ACE-Step Text to Music

Class Name

ACE_STEP_TextToMusic

Category
Audio/ACE-Step

Author
kana112233 (Account age: 3996days) Extension
ComfyUI-kaola-ace-step Latest Updated
2026-02-22 Github Stars
0.02K

Github Ask kana112233 Current Questions Past Questions

Table of Content

Description
ACE_STEP_TextToMusic:
ACE_STEP_TextToMusic Input Parameters:
ACE_STEP_TextToMusic Output Parameters:
ACE_STEP_TextToMusic Usage Tips:
ACE_STEP_TextToMusic Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-kaola-ace-step

Install this extension via the ComfyUI Manager by searching for ComfyUI-kaola-ace-step

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-kaola-ace-step in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ACE-Step Text to Music Description

Transform textual descriptions into musical compositions using advanced AI models for AI artists.

ACE-Step Text to Music:

The ACE_STEP_TextToMusic node is a powerful tool designed to transform textual descriptions into musical compositions. This node leverages advanced AI models to interpret natural language prompts and generate corresponding audio tracks, making it an invaluable asset for AI artists looking to create music from written ideas. By converting text into music, it opens up new creative possibilities, allowing users to explore and express musical ideas without needing traditional musical skills. The node is part of the ACE-Step 1.5 Music Generation suite, which ensures high-quality output by utilizing sophisticated algorithms and models. Its primary goal is to democratize music creation, enabling anyone to produce music that aligns with their textual vision.

ACE-Step Text to Music Input Parameters:

caption

The caption parameter is a string input that serves as the text prompt or natural language description for the music you wish to generate. This description guides the AI in creating a musical piece that aligns with your vision. It is a multiline field, allowing for detailed and expressive prompts. The default value is an empty string, and it is crucial for defining the style, mood, or theme of the generated music.

checkpoint_dir

The checkpoint_dir parameter specifies the directory containing the ACE-Step model weights, particularly the DiT model. This directory is essential for the node to access the pre-trained models required for music generation. The default value is the first directory returned by the get_acestep_checkpoints() function, ensuring that the node uses the most appropriate model weights available.

config_path

The config_path parameter determines the specific model configuration to use, such as acestep-v15-turbo. This configuration affects the speed and quality of the music generation process. The default setting is acestep-v15-turbo, which is optimized for faster performance, making it suitable for quick iterations and experimentation.

lm_model_path

The lm_model_path parameter indicates the path to the language model used for generating lyrics and metadata. This model plays a crucial role in enhancing the musical output by providing contextually relevant lyrics and metadata. The default value is acestep-5Hz-lm-1.7B, which is a robust model designed for high-quality lyric generation.

duration

The duration parameter sets the target length of the generated music in seconds. It allows you to control how long the musical piece will be, with a default value of 30.0 seconds. The minimum duration is 10.0 seconds, and the maximum is 600.0 seconds, providing flexibility to create anything from short jingles to extended compositions.

batch_size

The batch_size parameter defines the number of audio samples to generate in a single batch. This allows for the creation of multiple variations of the music based on the same text prompt. The default batch size is 2, with a minimum of 1 and a maximum of 8, enabling you to explore different interpretations of your input text.

seed

The seed parameter is an integer used for random seed generation, ensuring reproducibility of results. By setting a specific seed, you can generate the same musical output across different runs. The default value is -1, which indicates random generation, while the range extends up to 0xFFFFFFFFFFFFFFFF.

inference_steps

The inference_steps parameter controls the number of diffusion steps used in the music generation process. Higher values typically result in better quality but require more computation time. The default is 8 steps, with a range from 1 to 64, allowing you to balance between speed and quality based on your needs.

device

The device parameter specifies the computing platform on which the model will run. Options include auto, cuda, cpu, mps, and xpu, with auto as the default. This flexibility ensures that the node can leverage the best available hardware for optimal performance.

ACE-Step Text to Music Output Parameters:

generated_audio

The generated_audio parameter is the primary output of the node, providing the audio file generated from the text description. This output is the culmination of the node's processing, delivering a musical piece that reflects the input text's style, mood, and theme. The audio is typically in a standard format like WAV, ensuring compatibility with various audio editing and playback tools.

metadata

The metadata parameter includes additional information about the generated audio, such as lyrics, BPM, key, and other musical attributes. This metadata enriches the audio output by providing context and details that can be useful for further editing or analysis.

ACE-Step Text to Music Usage Tips:

To achieve the best results, provide a detailed and specific text prompt in the caption parameter. This helps the AI model understand your vision and generate music that closely aligns with your expectations.
Experiment with different inference_steps settings to find the right balance between quality and processing time. Higher steps generally improve audio quality but may take longer to compute.

ACE-Step Text to Music Common Errors and Solutions:

Understanding failed: `<error_message>`

Explanation: This error occurs when the node fails to interpret the input text or generate the corresponding music.
Solution: Ensure that your text prompt is clear and free of ambiguous language. If the problem persists, try adjusting the checkpoint_dir or config_path to ensure the correct models are being used.

OSError: libgomp.so.1 not found

Explanation: This error indicates that the required OpenMP library is not available on your system, which is necessary for the node's operation.
Solution: Follow the instructions in the _force_load_libgomp function to manually load the library or ensure that your environment is correctly set up with the necessary dependencies.

ACE-Step Text to Music Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-kaola-ace-step

Table of Content

Description
ACE_STEP_TextToMusic:
ACE_STEP_TextToMusic Input Parameters:
ACE_STEP_TextToMusic Output Parameters:
ACE_STEP_TextToMusic Usage Tips:
ACE_STEP_TextToMusic Common Errors and Solutions:
Related Nodes

Sonic | Lip-Sync Portrait Animation

Sonic delivers advanced audio-driven lip-sync for portraits with high-quality animation.

Qwen Image Edit Plus 2511 LoRA Inference | AI Toolkit ComfyUI

Keep AI Toolkit-trained Qwen Image Edit Plus 2511 LoRA edits in ComfyUI preview-aligned using a single RCQwenImageEditPlus2511 custom node.

FLUX Kontext Face Swap | Seamless Face Replacement

Photoreal face replacement with prompt-guided control and natural blending

Z-Image Finetuned Models Collection | Multi-Style Generator

Create stunning, detailed images across multiple styles and moods easily.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: ACE-Step Text to Music

ACE_STEP_TextToMusic

How to Install ComfyUI-kaola-ace-step

ACE-Step Text to Music Description

ACE-Step Text to Music:

ACE-Step Text to Music Input Parameters:

caption

checkpoint_dir

config_path

lm_model_path

duration

batch_size

seed

inference_steps

device

ACE-Step Text to Music Output Parameters:

generated_audio

metadata

ACE-Step Text to Music Usage Tips:

ACE-Step Text to Music Common Errors and Solutions:

Understanding failed: <error_message>

OSError: libgomp.so.1 not found

ACE-Step Text to Music Related Nodes

Understanding failed: `<error_message>`