ComfyUI>Workflows>ACE-Step Music Generation | AI Audio Creation

ACE-Step Music Generation | AI Audio Creation

Workflow Name: RunComfy/ACE-Step-Music

Workflow ID: 0000...1224

ACE-Step is a breakthrough open-source foundation model for music generation that bridges the gap between generation speed and musical quality. By integrating diffusion-based generation with Sana's Deep Compression AutoEncoder and a lightweight linear transformer, it synthesizes up to 4 minutes of high-quality music in just 20 seconds—15× faster than LLM-based alternatives. The model excels at maintaining musical coherence while offering advanced control over lyrics, voice cloning, and remixing capabilities.

This workflow is based on ACE-Step, co-developed by ACE Studio and StepFun. Original model created by Junmin Gong, Sean Zhao, Sen Wang, Shengyuan Xu, and Joe Guo.

ComfyUI ACE-Step Workflow

ACE-Step Music Generation Model in ComfyUI | AI Audio Creation

Want to run this workflow?

Fully operational workflows
No missing nodes or models
No manual setups required
Features stunning visuals

ComfyUI ACE-Step Examples

ComfyUI ACE-Step Description

1. What is the ComfyUI ACE-Step Workflow?

ComfyUI ACE-Step integrates the newly developed ACE-Step music generation foundation model into the ComfyUI environment. Built on a hybrid architecture combining diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, ACE-Step enables ultra-fast, high-quality music generation with exceptional control capabilities. This workflow allows users to create original music across diverse genres and styles with simple natural language prompts and lyrics.

2. Benefits of ComfyUI ACE-Step:

Unprecedented Speed: Synthesizes up to 4 minutes of music in just 20 seconds—15× faster than LLM-based alternatives
Musical Coherence: ACE-Step maintains superior quality across melody, harmony, and rhythm dimensions
Multilingual Support: Generates music in 19 different languages with exceptional performance in the top 10 languages
Advanced Control: Enables voice cloning, lyric editing, remixing, and track generation with fine-grained parameters
Creative Flexibility: Supports diverse music styles, genres, and instruments with various description formats
Seamless Integration: Plugs directly into ComfyUI workflows for AI-powered audio creation

3. How to Use the ComfyUI ACE-Step Workflow

3.1 Generation Methods with ComfyUI ACE-Step

Example Setup for ACE-Step:

Prepare inputs: In TextEncodeAceStepAudio node:
- Add descriptive tags for music style (e.g., "country rock, folk rock, southern rock, bluegrass, pop")
- Input lyrics with structure tags like [verse], [chorus], [bridge]
- Adjust lyrics_strength (1.00 is default)
Configure KSampler node parameters:
- Adjust steps (50 recommended for ACE-Step)
- Set cfg (4.0 is default)
- Set denoise value (1.00 is default)
In EmptyAceStepLatentAudio node:
- Set desired seconds duration (30.0 is default)
- Set batch_size
Click Run button to run the ACE-Step workflow
In SaveAudio node: listen to or save your generated music

ACE-Step Core Generation Workflow

Best for: Creating original music from text descriptions and lyrics
Characteristics:
- Fast generation (15× faster than LLM alternatives)
- Strong musical coherence and quality
- Flexible duration control

ACE-Step Specialized Workflows (LoRA-based)

Lyric2Vocal: ACE-Step model fine-tuned for generating high-quality vocals from lyrics
Text2Samples: Specialized ACE-Step variant for producing instrumental loops and samples
RapMachine: Optimized ACE-Step model for rap generation with various styles

3.2 Parameter Reference for ComfyUI ACE-Step

TextEncodeAceStepAudio Node: This node processes text inputs to guide ACE-Step music generation.

clip: Text field for style descriptions, genres, and mood
lyrics: Text field for song lyrics with optional structure tags
lyrics_strength: Controls how strongly the lyrics influence generation (default: 1.00)

KSampler Node: Controls the diffusion sampling process in ACE-Step.

seed: Sets randomization seed for reproducible results
control_after_generate: Options for seed behavior after generation
steps: Number of diffusion steps (higher = more refinement)
cfg: Classifier-free guidance scale (higher = more adherence to prompt)
sampler_name: Algorithm used for sampling (res_multistep recommended)
scheduler: Noise schedule type (simple recommended)
denoise: Controls noise removal level (1.00 is full denoising)

EmptyAceStepLatentAudio Node: Initializes the audio generation space.

seconds: Duration of generated audio in seconds
batch_size: Number of samples to generate simultaneously

VAEDecodeAudio Node: Decodes latent representations into audible format.

samples: Input from KSampler
vae: VAE model used for decoding

SaveAudio Node: Outputs the final ACE-Step audio result.

filename_prefix: Prefix for saved audio files
audio: Player for previewing generated audio

3.3. Advanced Techniques with ComfyUI ACE-Step

Variations Generation:

Adjust variance parameter to control similarity to original ACE-Step generations
Higher variance creates more divergent outputs while preserving core musical elements

Repainting:

Selectively regenerate specific sections of audio while preserving the rest
Useful for fixing problematic segments without changing the entire composition

Lyric Editing in ACE-Step:

Modify lyrics while maintaining melody, vocal timbre, and accompaniment
Supports editing in multiple languages while preserving musical structure

Voice Cloning:

Preserves vocal characteristics while generating new content with ACE-Step
Can be combined with lyric editing for flexible vocal performances

Style Transfer:

Apply new musical styles to existing compositions
Maintains core musical structure while adopting different genre characteristics

3.4. ACE-Step Prompt Tips:

For General Music:

Be specific about genre, mood, and instrumentation in ACE-Step prompts
Example prompts: "electronic, rock, pop" or "funk, pop, soul, melodic"
More detailed prompts: "dark, death rock, metal, hardcore, electric guitar, powerful, bass, drums, 110 bpm, G major"

For Instrumental Music:

Specify instruments and musical characteristics
Example prompts: "saxophone, jazz" or "violin, solo, fast tempo"
More detailed prompts: "sonata, piano, Violin, B Flat Major, allegro"

For Multilingual Support:

ACE-Step works best with: English, Chinese, Russian, Spanish, Japanese, German, French, Portuguese, Italian, Korean
Non-Latin script languages like Chinese, Japanese, and Korean are well-supported

More Information about ACE-Step

For additional details and development references:

Original ACE-Step model by
Model developers: Junmin Gong, Sean Zhao, Sen Wang, Shengyuan Xu, and Joe Guo

Acknowledgements

This workflow is powered by ACE-Step, co-developed by ACE Studio and StepFun. The ComfyUI ACE-Step integration enables seamless music generation within the ComfyUI environment. Full credit goes to the original authors for their groundbreaking work on ACE-Step.

Want More ComfyUI Workflows?

Loading preview...