ComfyUI > Nodes > ComfyUI-kaola-ace-step > ACE-Step Text to Music

ComfyUI Node: ACE-Step Text to Music

Class Name

ACE_STEP_TextToMusic

Category
Audio/ACE-Step
Author
kana112233 (Account age: 3996days)
Extension
ComfyUI-kaola-ace-step
Latest Updated
2026-02-22
Github Stars
0.02K

How to Install ComfyUI-kaola-ace-step

Install this extension via the ComfyUI Manager by searching for ComfyUI-kaola-ace-step
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-kaola-ace-step in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ACE-Step Text to Music Description

Transform textual descriptions into musical compositions using advanced AI models for AI artists.

ACE-Step Text to Music:

The ACE_STEP_TextToMusic node is a powerful tool designed to transform textual descriptions into musical compositions. This node leverages advanced AI models to interpret natural language prompts and generate corresponding audio tracks, making it an invaluable asset for AI artists looking to create music from written ideas. By converting text into music, it opens up new creative possibilities, allowing users to explore and express musical ideas without needing traditional musical skills. The node is part of the ACE-Step 1.5 Music Generation suite, which ensures high-quality output by utilizing sophisticated algorithms and models. Its primary goal is to democratize music creation, enabling anyone to produce music that aligns with their textual vision.

ACE-Step Text to Music Input Parameters:

caption

The caption parameter is a string input that serves as the text prompt or natural language description for the music you wish to generate. This description guides the AI in creating a musical piece that aligns with your vision. It is a multiline field, allowing for detailed and expressive prompts. The default value is an empty string, and it is crucial for defining the style, mood, or theme of the generated music.

checkpoint_dir

The checkpoint_dir parameter specifies the directory containing the ACE-Step model weights, particularly the DiT model. This directory is essential for the node to access the pre-trained models required for music generation. The default value is the first directory returned by the get_acestep_checkpoints() function, ensuring that the node uses the most appropriate model weights available.

config_path

The config_path parameter determines the specific model configuration to use, such as acestep-v15-turbo. This configuration affects the speed and quality of the music generation process. The default setting is acestep-v15-turbo, which is optimized for faster performance, making it suitable for quick iterations and experimentation.

lm_model_path

The lm_model_path parameter indicates the path to the language model used for generating lyrics and metadata. This model plays a crucial role in enhancing the musical output by providing contextually relevant lyrics and metadata. The default value is acestep-5Hz-lm-1.7B, which is a robust model designed for high-quality lyric generation.

duration

The duration parameter sets the target length of the generated music in seconds. It allows you to control how long the musical piece will be, with a default value of 30.0 seconds. The minimum duration is 10.0 seconds, and the maximum is 600.0 seconds, providing flexibility to create anything from short jingles to extended compositions.

batch_size

The batch_size parameter defines the number of audio samples to generate in a single batch. This allows for the creation of multiple variations of the music based on the same text prompt. The default batch size is 2, with a minimum of 1 and a maximum of 8, enabling you to explore different interpretations of your input text.

seed

The seed parameter is an integer used for random seed generation, ensuring reproducibility of results. By setting a specific seed, you can generate the same musical output across different runs. The default value is -1, which indicates random generation, while the range extends up to 0xFFFFFFFFFFFFFFFF.

inference_steps

The inference_steps parameter controls the number of diffusion steps used in the music generation process. Higher values typically result in better quality but require more computation time. The default is 8 steps, with a range from 1 to 64, allowing you to balance between speed and quality based on your needs.

device

The device parameter specifies the computing platform on which the model will run. Options include auto, cuda, cpu, mps, and xpu, with auto as the default. This flexibility ensures that the node can leverage the best available hardware for optimal performance.

ACE-Step Text to Music Output Parameters:

generated_audio

The generated_audio parameter is the primary output of the node, providing the audio file generated from the text description. This output is the culmination of the node's processing, delivering a musical piece that reflects the input text's style, mood, and theme. The audio is typically in a standard format like WAV, ensuring compatibility with various audio editing and playback tools.

metadata

The metadata parameter includes additional information about the generated audio, such as lyrics, BPM, key, and other musical attributes. This metadata enriches the audio output by providing context and details that can be useful for further editing or analysis.

ACE-Step Text to Music Usage Tips:

  • To achieve the best results, provide a detailed and specific text prompt in the caption parameter. This helps the AI model understand your vision and generate music that closely aligns with your expectations.
  • Experiment with different inference_steps settings to find the right balance between quality and processing time. Higher steps generally improve audio quality but may take longer to compute.

ACE-Step Text to Music Common Errors and Solutions:

Understanding failed: <error_message>

  • Explanation: This error occurs when the node fails to interpret the input text or generate the corresponding music.
  • Solution: Ensure that your text prompt is clear and free of ambiguous language. If the problem persists, try adjusting the checkpoint_dir or config_path to ensure the correct models are being used.

OSError: libgomp.so.1 not found

  • Explanation: This error indicates that the required OpenMP library is not available on your system, which is necessary for the node's operation.
  • Solution: Follow the instructions in the _force_load_libgomp function to manually load the library or ensure that your environment is correctly set up with the necessary dependencies.

ACE-Step Text to Music Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-kaola-ace-step
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

ACE-Step Text to Music