Ace Step 1.5: Text-to-Music with Vocals, Lyrics & Style Tag Control on Models and API

acestep-ai/ace-step-1.5/text-to-audio

Generate songs up to 4 minutes from style tags and optional lyrics with original vocals and high acoustic fidelity, on RunComfy models and HTTP API.

Idle

The rate is $0.0003 per second.

Introduction To Ace Step 1.5

ACE Studio's Ace Step 1.5 transforms text style tags and optional structured lyrics into complete songs up to 4 minutes long at $0.0003 per second, with support for 50+ languages, coherent vocals, and high acoustic fidelity. Trading manual scoring sessions, vocalist bookings, and multi-track production for tag-driven, prompt-controlled Ace Step 1.5 generation, the model accelerates music ideation for media teams, game studios, content creators, and advertising producers. For developers, Ace Step 1.5 on RunComfy can be used both in the browser and via an HTTP API, so you don't need to host or scale the model yourself.
Ideal for: Music Demo Prototyping | Cinematic and Game Scoring | Short-Form Ad Music

ACE Studio / Ace Step 1.5#

Ace Step 1.5 is a text-to-music generation model that turns comma-separated style tags and optional structured lyrics into full songs with vocals, instrumentation, and synchronized lyric phrasing. It supports 50+ languages, runs efficiently, and is built for fast iteration with durations from a few seconds up to 4 minutes (240 seconds).

Output format: Audio only / duration 5–240 seconds / stereo / provider-defined sample rate.

Parameters#

Parameter	Required	Type	Default	Range / Options	Description
tags*	Yes (*)	string	—	Free text	Comma-separated list of genre, mood, and instrument tags.
lyrics	No	string	—	Free text or [inst] / [instrumental]	Vocal content; use section markers like [Verse], [Chorus], [Bridge] to structure the song.
duration	No	integer	60	5 – 240	Audio length in seconds.
seed	No	integer	-1	-1 – 2147483647	Random seed for reproducibility; -1 randomizes.

Pricing#

Ace Step 1.5 on RunComfy uses time-based billing for generated audio.

Billing unit	Rate
Per second of generated audio	$0.0003

Estimated cost examples

Duration	Approx. cost
30 s	~$0.009
60 s (default)	~$0.018
120 s	~$0.036
240 s (4 min)	~$0.072

How to Use#

1) Open the Ace Step 1.5 model in RunComfy and reveal the generation panel.

2) Enter style tags such as "lofi, hiphop, chill, mellow piano" to define genre, mood, and instrumentation.

3) Optionally add lyrics; keep [Verse], [Chorus], and [Bridge] sections clearly separated, or use [inst] for an instrumental.

4) Set duration in seconds (5–240); start short to test direction before committing to a full 4-minute render.

5) Lock the seed when you want to compare the impact of tag or lyric changes, or leave it at -1 for variety.

6) Run the generation, preview the result, and download the audio file from your job history.

7) For API use, send the same fields to the Ace Step 1.5 endpoint on RunComfy; no self-hosting is required.

8) Save promising seeds and tag combinations as presets to keep your sonic direction consistent across a project.

Related Models

sora-2/text-to-video

Generate realistic videos with synced audio from text using OpenAI Sora 2.

seedance-1.0/pro-fast/text-to-video

High-speed text-to-motion generator for cinematic storytelling use.

wan-2-2/fun-camera

Create smooth motion clips from stills with custom camera moves.

happyhorse-1.0/text-to-video

HappyHorse 1.0 with native 1080p output, cinematic motion, and multi-shot consistency.

hailuo-2-3/standard/image-to-video

Transform images into motion-rich clips with Hailuo 2.3's precise control and realistic visuals.

kling-2-6/motion-control-pro

Cinematic motion model for fluid scene creation and adaptive visual editing.

Frequently Asked Questions

What is Ace Step 1.5 and what does it do in a text-to-audio workflow?

Ace Step 1.5 is a text-to-music model from acestep-ai that turns style tags and optional structured lyrics into full audio tracks with melody, rhythm, and vocals. In a text-to-audio workflow on RunComfy, you describe the genre, mood, and song structure, and Ace Step 1.5 generates a coherent musical piece with synchronized lyric phrasing. It is designed for creators who want fast, prompt-driven music generation without manual composition.

What kinds of generation tasks is Ace Step 1.5 best suited for?

Ace Step 1.5 is best suited for text-to-audio tasks such as background music for videos, short song demos, ambient loops, ad jingles, and reference tracks for game scenes. It handles tag-based styling well, so you can steer genre, instrumentation, and energy with a few descriptors. Lyric and vocal generation also makes Ace Step 1.5 useful for songwriting drafts and creative prototyping.

How does Ace Step 1.5 compare to the original Ace Step and other music models?

Compared to the original Ace Step, version 1.5 keeps the same tag-driven control and 4-minute maximum duration while expanding multilingual lyric support to 50+ languages and refining structured-lyric handling. Compared to instrumental-only systems, Ace Step 1.5 natively produces vocals, instrumentation, and synchronized phrasing in a single pass. Reproducibility through a seed parameter helps developers iterate consistently on a chosen direction.

Which teams and use cases benefit most from Ace Step 1.5 in production?

Designers, technical artists, video creators, and product teams can use Ace Step 1.5 for trailers, social content, prototype game audio, e-commerce videos, and ad creatives. Developers can wrap it into pipelines that need on-demand soundtracks tied to scene metadata or campaign briefs. Because Ace Step 1.5 supports both vocals and instrumentals across many languages, it covers a wide range of audio needs from a single interface.

What input and output limits should I know before using Ace Step 1.5?

Ace Step 1.5 supports flexible duration, adjustable from 5 seconds up to 240 seconds (4 minutes) per generation, with a single required tags field and optional structured lyrics. Other constraints such as supported audio formats and tag combinations depend on the current provider configuration, so check the RunComfy parameter panel for exact limits before building around them. Limits may vary by mode or provider settings.

How do I move from testing Ace Step 1.5 in the model UI to using it in production via the RunComfy API?

You can prototype Ace Step 1.5 in the RunComfy AI Playground Web UI by adjusting style tags, lyrics, duration, and seed until the text-to-audio output matches your target. Once the configuration is stable, call the same Ace Step 1.5 model through the RunComfy API with identical parameters to automate generation from your backend or content pipeline. This keeps creative iteration in the browser and production runs in code, without changing the underlying model behavior.

How is pricing handled when generating audio with Ace Step 1.5 on RunComfy?

Ace Step 1.5 generations consume usd / credits from your RunComfy balance, and based on available provider information the model is billed per second at $0.0003. New users typically get a free trial usd amount to experiment, after which usage follows the Generation rules shown on the model page. For the most current rates and any mode-specific differences, refer to the Generation section of the Ace Step 1.5 page on RunComfy.

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Ace Step 1.5: Text-to-Music with Vocals, Lyrics & Style Tag Control on Models and API | RunComfy

Generate songs up to 4 minutes from style tags and optional lyrics with original vocals and high acoustic fidelity, on RunComfy models and HTTP API.

Introduction To Ace Step 1.5

ACE Studio / Ace Step 1.5#

Parameters#

Pricing#

How to Use#

Related Models

Frequently Asked Questions

What is Ace Step 1.5 and what does it do in a text-to-audio workflow?

What kinds of generation tasks is Ace Step 1.5 best suited for?

How does Ace Step 1.5 compare to the original Ace Step and other music models?

Which teams and use cases benefit most from Ace Step 1.5 in production?

What input and output limits should I know before using Ace Step 1.5?

How do I move from testing Ace Step 1.5 in the model UI to using it in production via the RunComfy API?

How is pricing handled when generating audio with Ace Step 1.5 on RunComfy?

Ace Step 1.5: Text-to-Music with Vocals, Lyrics & Style Tag Control on Models and API | RunComfy

Generate songs up to 4 minutes from style tags and optional lyrics with original vocals and high acoustic fidelity, on RunComfy models and HTTP API.

Introduction To Ace Step 1.5

ACE Studio / Ace Step 1.5#

Parameters#

Pricing#

How to Use#

Related Models

Frequently Asked Questions

What is Ace Step 1.5 and what does it do in a text-to-audio workflow?

What kinds of generation tasks is Ace Step 1.5 best suited for?

How does Ace Step 1.5 compare to the original Ace Step and other music models?

Which teams and use cases benefit most from Ace Step 1.5 in production?

What input and output limits should I know before using Ace Step 1.5?

How do I move from testing Ace Step 1.5 in the model UI to using it in production via the RunComfy API?

How is pricing handled when generating audio with Ace Step 1.5 on RunComfy?