ComfyUI  >  Workflows  >  ACE-Step Music Generation | AI Audio Creation

ACE-Step Music Generation | AI Audio Creation

ACE-Step is a breakthrough open-source foundation model for music generation that bridges the gap between generation speed and musical quality. By integrating diffusion-based generation with Sana's Deep Compression AutoEncoder and a lightweight linear transformer, it synthesizes up to 4 minutes of high-quality music in just 20 seconds—15× faster than LLM-based alternatives. The model excels at maintaining musical coherence while offering advanced control over lyrics, voice cloning, and remixing capabilities.

ComfyUI ACE-Step Workflow

ACE-Step Music Generation Model in ComfyUI | AI Audio Creation
Want to run this workflow?
  • Fully operational workflows
  • No missing nodes or models
  • No manual setups required
  • Features stunning visuals

ComfyUI ACE-Step Examples

ComfyUI ACE-Step Description

1. What is the ComfyUI ACE-Step Workflow?

ComfyUI ACE-Step integrates the newly developed ACE-Step music generation foundation model into the ComfyUI environment. Built on a hybrid architecture combining diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, ACE-Step enables ultra-fast, high-quality music generation with exceptional control capabilities. This workflow allows users to create original music across diverse genres and styles with simple natural language prompts and lyrics.

2. Benefits of ComfyUI ACE-Step:

  • Unprecedented Speed: Synthesizes up to 4 minutes of music in just 20 seconds—15× faster than LLM-based alternatives
  • Musical Coherence: ACE-Step maintains superior quality across melody, harmony, and rhythm dimensions
  • Multilingual Support: Generates music in 19 different languages with exceptional performance in the top 10 languages
  • Advanced Control: Enables voice cloning, lyric editing, remixing, and track generation with fine-grained parameters
  • Creative Flexibility: Supports diverse music styles, genres, and instruments with various description formats
  • Seamless Integration: Plugs directly into ComfyUI workflows for AI-powered audio creation

3. How to Use the ComfyUI ACE-Step Workflow

3.1 Generation Methods with ComfyUI ACE-Step

Example Setup for ACE-Step:

  1. Prepare inputs: In TextEncodeAceStepAudio node:
    • Add descriptive tags for music style (e.g., "country rock, folk rock, southern rock, bluegrass, pop")
    • Input lyrics with structure tags like [verse], [chorus], [bridge]
    • Adjust lyrics_strength (1.00 is default)
  2. Configure KSampler node parameters:
    • Adjust steps (50 recommended for ACE-Step)
    • Set cfg (4.0 is default)
    • Set denoise value (1.00 is default)
  3. In EmptyAceStepLatentAudio node:
    • Set desired seconds duration (30.0 is default)
    • Set batch_size
  4. Click Run button to run the ACE-Step workflow
  5. In SaveAudio node: listen to or save your generated music
ACE-Step Core Generation Workflow
  • Best for: Creating original music from text descriptions and lyrics
  • Characteristics:
    • Fast generation (15× faster than LLM alternatives)
    • Strong musical coherence and quality
    • Flexible duration control
ACE-Step Specialized Workflows (LoRA-based)
  • Lyric2Vocal: ACE-Step model fine-tuned for generating high-quality vocals from lyrics
  • Text2Samples: Specialized ACE-Step variant for producing instrumental loops and samples
  • RapMachine: Optimized ACE-Step model for rap generation with various styles

3.2 Parameter Reference for ComfyUI ACE-Step

TextEncodeAceStepAudio Node: This node processes text inputs to guide ACE-Step music generation.

  • clip: Text field for style descriptions, genres, and mood
  • lyrics: Text field for song lyrics with optional structure tags
  • lyrics_strength: Controls how strongly the lyrics influence generation (default: 1.00)

KSampler Node: Controls the diffusion sampling process in ACE-Step.

  • seed: Sets randomization seed for reproducible results
  • control_after_generate: Options for seed behavior after generation
  • steps: Number of diffusion steps (higher = more refinement)
  • cfg: Classifier-free guidance scale (higher = more adherence to prompt)
  • sampler_name: Algorithm used for sampling (res_multistep recommended)
  • scheduler: Noise schedule type (simple recommended)
  • denoise: Controls noise removal level (1.00 is full denoising)

EmptyAceStepLatentAudio Node: Initializes the audio generation space.

  • seconds: Duration of generated audio in seconds
  • batch_size: Number of samples to generate simultaneously

VAEDecodeAudio Node: Decodes latent representations into audible format.

  • samples: Input from KSampler
  • vae: VAE model used for decoding

SaveAudio Node: Outputs the final ACE-Step audio result.

  • filename_prefix: Prefix for saved audio files
  • audio: Player for previewing generated audio

3.3. Advanced Techniques with ComfyUI ACE-Step

Variations Generation:

  • Adjust variance parameter to control similarity to original ACE-Step generations
  • Higher variance creates more divergent outputs while preserving core musical elements

Repainting:

  • Selectively regenerate specific sections of audio while preserving the rest
  • Useful for fixing problematic segments without changing the entire composition

Lyric Editing in ACE-Step:

  • Modify lyrics while maintaining melody, vocal timbre, and accompaniment
  • Supports editing in multiple languages while preserving musical structure

Voice Cloning:

  • Preserves vocal characteristics while generating new content with ACE-Step
  • Can be combined with lyric editing for flexible vocal performances

Style Transfer:

  • Apply new musical styles to existing compositions
  • Maintains core musical structure while adopting different genre characteristics

3.4. ACE-Step Prompt Tips:

For General Music:

  • Be specific about genre, mood, and instrumentation in ACE-Step prompts
  • Example prompts: "electronic, rock, pop" or "funk, pop, soul, melodic"
  • More detailed prompts: "dark, death rock, metal, hardcore, electric guitar, powerful, bass, drums, 110 bpm, G major"

For Instrumental Music:

  • Specify instruments and musical characteristics
  • Example prompts: "saxophone, jazz" or "violin, solo, fast tempo"
  • More detailed prompts: "sonata, piano, Violin, B Flat Major, allegro"

For Multilingual Support:

  • ACE-Step works best with: English, Chinese, Russian, Spanish, Japanese, German, French, Portuguese, Italian, Korean
  • Non-Latin script languages like Chinese, Japanese, and Korean are well-supported

More Information about ACE-Step

For additional details and development references:

  • Original ACE-Step model by
  • Model developers: Junmin Gong, Sean Zhao, Sen Wang, Shengyuan Xu, and Joe Guo

Acknowledgements

This workflow is powered by ACE-Step, co-developed by ACE Studio and StepFun. The ComfyUI ACE-Step integration enables seamless music generation within the ComfyUI environment. Full credit goes to the original authors for their groundbreaking work on ACE-Step.

Want More ComfyUI Workflows?

RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.