Create cinematic clips in seconds with Veo 3.1 Fast, built for instant text-driven motion and creative control.
Ace Step 1.5 is a text-to-music generation model that turns comma-separated style tags and optional structured lyrics into full songs with vocals, instrumentation, and synchronized lyric phrasing. It supports 50+ languages, runs efficiently, and is built for fast iteration with durations from a few seconds up to 4 minutes (240 seconds).
Output format: Audio only / duration 5–240 seconds / stereo / provider-defined sample rate.
| Parameter | Required | Type | Default | Range / Options | Description |
|---|---|---|---|---|---|
| tags* | Yes (*) | string | — | Free text | Comma-separated list of genre, mood, and instrument tags. |
| lyrics | No | string | — | Free text or [inst] / [instrumental] | Vocal content; use section markers like [Verse], [Chorus], [Bridge] to structure the song. |
| duration | No | integer | 60 | 5 – 240 | Audio length in seconds. |
| seed | No | integer | -1 | -1 – 2147483647 | Random seed for reproducibility; -1 randomizes. |
Ace Step 1.5 on RunComfy uses time-based billing for generated audio.
| Billing unit | Rate |
|---|---|
| Per second of generated audio | $0.0003 |
Estimated cost examples
| Duration | Approx. cost |
|---|---|
| 30 s | ~$0.009 |
| 60 s (default) | ~$0.018 |
| 120 s | ~$0.036 |
| 240 s (4 min) | ~$0.072 |
1) Open the Ace Step 1.5 model in RunComfy and reveal the generation panel.
2) Enter style tags such as "lofi, hiphop, chill, mellow piano" to define genre, mood, and instrumentation.
3) Optionally add lyrics; keep [Verse], [Chorus], and [Bridge] sections clearly separated, or use [inst] for an instrumental.
4) Set duration in seconds (5–240); start short to test direction before committing to a full 4-minute render.
5) Lock the seed when you want to compare the impact of tag or lyric changes, or leave it at -1 for variety.
6) Run the generation, preview the result, and download the audio file from your job history.
7) For API use, send the same fields to the Ace Step 1.5 endpoint on RunComfy; no self-hosting is required.
8) Save promising seeds and tag combinations as presets to keep your sonic direction consistent across a project.
Create cinematic clips in seconds with Veo 3.1 Fast, built for instant text-driven motion and creative control.
Next-gen tool turning prompts into cinematic 4K video clips with audio
Produces crisp 1080p AI videos with smart motion logic and speed
Add a person or object into an existing video with smart compositing.
Refined AI visuals, real-time control, and pro FX for creators
Generate fast, high quality videos from text with Kling 2.5 Turbo.
Ace Step 1.5 is a text-to-music model from acestep-ai that turns style tags and optional structured lyrics into full audio tracks with melody, rhythm, and vocals. In a text-to-audio workflow on RunComfy, you describe the genre, mood, and song structure, and Ace Step 1.5 generates a coherent musical piece with synchronized lyric phrasing. It is designed for creators who want fast, prompt-driven music generation without manual composition.
Ace Step 1.5 is best suited for text-to-audio tasks such as background music for videos, short song demos, ambient loops, ad jingles, and reference tracks for game scenes. It handles tag-based styling well, so you can steer genre, instrumentation, and energy with a few descriptors. Lyric and vocal generation also makes Ace Step 1.5 useful for songwriting drafts and creative prototyping.
Compared to the original Ace Step, version 1.5 keeps the same tag-driven control and 4-minute maximum duration while expanding multilingual lyric support to 50+ languages and refining structured-lyric handling. Compared to instrumental-only systems, Ace Step 1.5 natively produces vocals, instrumentation, and synchronized phrasing in a single pass. Reproducibility through a seed parameter helps developers iterate consistently on a chosen direction.
Designers, technical artists, video creators, and product teams can use Ace Step 1.5 for trailers, social content, prototype game audio, e-commerce videos, and ad creatives. Developers can wrap it into pipelines that need on-demand soundtracks tied to scene metadata or campaign briefs. Because Ace Step 1.5 supports both vocals and instrumentals across many languages, it covers a wide range of audio needs from a single interface.
Ace Step 1.5 supports flexible duration, adjustable from 5 seconds up to 240 seconds (4 minutes) per generation, with a single required tags field and optional structured lyrics. Other constraints such as supported audio formats and tag combinations depend on the current provider configuration, so check the RunComfy parameter panel for exact limits before building around them. Limits may vary by mode or provider settings.
You can prototype Ace Step 1.5 in the RunComfy AI Playground Web UI by adjusting style tags, lyrics, duration, and seed until the text-to-audio output matches your target. Once the configuration is stable, call the same Ace Step 1.5 model through the RunComfy API with identical parameters to automate generation from your backend or content pipeline. This keeps creative iteration in the browser and production runs in code, without changing the underlying model behavior.
Ace Step 1.5 generations consume usd / credits from your RunComfy balance, and based on available provider information the model is billed per second at $0.0003. New users typically get a free trial usd amount to experiment, after which usage follows the Generation rules shown on the model page. For the most current rates and any mode-specific differences, refer to the Generation section of the Ace Step 1.5 page on RunComfy.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.