ACE-Step Audio Inpaint: Segment Audio Editing, Lyric & Style Rewrite on Models and API

acestep-ai/ace-step/audio-inpaint

Re-synthesize a marked slice of an audio track with new style tags or lyrics, stitched back into the original mix, on RunComfy models and HTTP API.

Audio *

0:00

Source audio to edit. Provide an HTTPS URL to an MP3, WAV, or FLAC file (up to 60 minutes).

Tags *

Comma-separated genre, mood, and instrument tags that steer the style of the regenerated segment.

Start Time Relative To

Reference point for the start time. 'start' counts from the beginning of the track, 'end' counts backwards from the end.

Start Time (seconds)

Start of the editable segment in seconds, measured from the chosen reference point.

End Time Relative To

Reference point for the end time. 'start' counts from the beginning of the track, 'end' counts backwards from the end.

End Time (seconds)

End of the editable segment in seconds, measured from the chosen reference point.

Lyrics

Optional lyrics for the regenerated segment. Leave blank to let the model write lyrics, or use [inst] / [instrumental] for no vocals.

Seed

Random seed for reproducibility. Use -1 to randomize.

Idle

The rate is $0.0002 per second of output audio.

Introduction To ACE-Step Audio Inpaint

ACE Studio's ACE-Step Audio Inpaint re-synthesizes a marked slice of an existing track at $0.0002 per second of output audio, swapping lyrics, instrumentation, or mood while leaving the rest of the mix untouched. Trading full song re-renders and manual stem surgery for targeted, prompt-driven ACE-Step Audio Inpaint passes, the model shortens music repair and remix loops for producers, sound designers, game audio teams, and ad creatives. For developers, ACE-Step Audio Inpaint on RunComfy can be used both in the browser and via an HTTP API, so you don't need to host or scale the model yourself.
Ideal for: Lyric Rewrites And Punch-Ins | Section Restyle And Remix | Voiceover And Sound-Effect Repair

ACE Studio / ACE-Step Audio Inpaint#

ACE-Step Audio Inpaint is an audio editing model that re-renders a defined slice of an existing track without disturbing the rest. Pass in a source audio URL, mark the start and end of the slice, and describe the replacement with style tags plus an optional lyric block. The model synthesizes a fresh segment in place and stitches it back into the surrounding mix.

Output format: Audio only / source up to 60 minutes / slice range 0–240 seconds / provider-defined sample rate.

Highlights#

Slice-level rewrites: Lock a window with start and end times and re-synthesize only that window — everything outside it stays bit-for-bit identical.
Boundary-aware stitching: The model conditions on the audio just before and after the slice so the patch lines up in tempo, key, and timbre without obvious seams.
Anchor from either end: Time markers can count up from the opening or count back from the tail, which makes outro fixes and tail-end edits straightforward.
Tag-led restyle and re-lyric: Drive new instrumentation, mood, or a fresh vocal line from a short tag list and an optional lyric block.
Reproducible takes: Pin the seed to lock one specific take, or randomize it to sample alternate variations of the same brief.
Parity across UI and API: Identical inputs run in the RunComfy model UI and through the HTTP API.

Parameters#

Parameter	Required	Type	Default	Range / Options	Description
audio*	Yes (*)	string	—	HTTPS URL to MP3 / WAV / FLAC, up to 60 min	Source audio file to edit.
tags*	Yes (*)	string	—	Free text	Comma-separated genre, mood, and instrument tags steering the segment style.
start_time_relative_to	No	string	start	start, end	Reference point for start_time.
start_time	No	number	—	0 – 240	Start of the editable segment in seconds.
end_time_relative_to	No	string	start	start, end	Reference point for end_time.
end_time	No	number	30	0 – 240	End of the editable segment in seconds.
lyrics	No	string	—	Free text or [inst] / [instrumental]	Lyrics for the regenerated segment; blank lets the model write its own.
seed	No	integer	-1	-1 – 2147483647	Random seed for reproducibility; -1 randomizes.

Pricing#

ACE-Step Audio Inpaint on RunComfy uses time-based billing tied to the duration of the output audio it produces.

Billing unit	Rate
Per second of output audio	$0.0002

Estimated cost examples

Output duration	Approx. cost
30 s	~$0.006
60 s	~$0.012
180 s (3 min)	~$0.036

Prompt & Reference Tips#

Pick segment boundaries on natural musical beats or phrase ends so transitions feel intentional.
Use multiple complementary tags (e.g., "lofi, mellow piano, soft drums") to lock the segment's genre and instrumentation.
Match the new tags to the surrounding audio's energy when you want a subtle repair; diverge boldly when you want a remix.
Provide structured lyrics with consistent syllable counts so the rewritten vocal phrasing aligns with the surrounding bars.
For instrumentals or to remove vocals from a segment, set lyrics to [inst] or [instrumental].
Use start_time_relative_to: end with a small offset to nudge a fade-out or final hook without recomputing time from the front.
Fix the seed when iterating; only change tags or lyrics so you can attribute differences to your edits.
Keep edits to focused windows (a few bars at a time) for cleaner blending than rewriting very long ranges.

Related Models

happyhorse-1.0/text-to-video

HappyHorse 1.0 with native 1080p output, cinematic motion, and multi-shot consistency.

kling-video-o1/image-to-video

Transform static visuals into cinematic motion with Kling O1's precise scene control and lifelike generation.

seedance-1.0/text-to-video

Generate cinematic videos from text prompts with Seedance 1.0.

ltx-2/retake-video

LTX 2 retake video modifie the video by the prompt.

kling-2-5/turbo/image-to-video

Render fluid, stylized scenes with fast, frame-consistent output

kling-3.0/pro/image-to-video

Premium image-to-video with the highest visual fidelity and motion realism in the Kling V3.0 family.

Frequently Asked Questions

What is ACE-Step Audio Inpaint and what does it do in an audio-to-audio workflow?

ACE-Step Audio Inpaint is an audio editing model from acestep-ai that rewrites a chosen time range inside an existing track while preserving the surrounding audio. In an audio-to-audio workflow on RunComfy, you provide a source URL, mark start and end times, and supply style tags or new lyrics, and ACE-Step Audio Inpaint regenerates only that segment. It is built for repair, restyle, and targeted remix work without re-rendering the full song.

What kinds of generation tasks is ACE-Step Audio Inpaint best suited for?

ACE-Step Audio Inpaint is best suited for tasks like fixing off-beat or noisy bars, rewriting a single verse or chorus, restyling a section's instrumentation, and replacing a lyric line on a finished track. It also works well for short voiceover or sound-effect repairs where the surrounding audio must stay intact. Because it is segment-based, it fits remix, mastering prep, and content-edit pipelines more than full-song generation.

How does ACE-Step Audio Inpaint compare to full text-to-music or stem-based editing approaches?

Compared to full text-to-music models, ACE-Step Audio Inpaint focuses on time-range edits and seamless blending into the existing track instead of producing a song from scratch. Compared to manual stem separation plus regeneration, it bundles segment selection, resynthesis, and crossfading into a single audio-to-audio pass. This typically gives technical artists tighter control over what changes and what stays the same.

Which teams and use cases benefit most from ACE-Step Audio Inpaint in production?

Music producers, sound designers, game audio teams, and ad creatives benefit from ACE-Step Audio Inpaint when they need to repair, rewrite, or restyle specific sections of an existing track. Developers can wrap it into editing tools that let users mark a range and submit new tags or lyrics through an audio-to-audio interface. Content teams can also use it for last-mile fixes on trailers, podcasts, or game cinematics.

What input and output limits should I know before using ACE-Step Audio Inpaint?

Source audio is typically supplied as a public HTTPS URL to MP3, WAV, or FLAC, and based on available provider information may be up to about 60 minutes long. The editable segment in ACE-Step Audio Inpaint is bounded by start_time and end_time in seconds, with each value in the 0–240 range and anchored to either the start or end of the track. Other constraints such as sample rate and exact format support depend on provider settings, so check the RunComfy parameter panel for the live limits.

How do I move from testing ACE-Step Audio Inpaint in the Playground to using it in production via the RunComfy API?

You can prototype ACE-Step Audio Inpaint in the RunComfy AI Playground Web UI by adjusting the audio URL, segment range, tags, lyrics, and seed until the audio-to-audio result matches your target. Once the configuration is stable, call the same ACE-Step Audio Inpaint model through the RunComfy API with identical parameters to automate edits from your backend or content pipeline. This keeps creative iteration in the browser and production runs in code, without changing model behavior.

How is pricing handled when running ACE-Step Audio Inpaint on RunComfy?

ACE-Step Audio Inpaint generations consume usd / credits from your RunComfy balance, and based on available provider information the model is billed at $0.0002 per second of output audio. New users typically get a free trial usd amount to experiment, after which usage follows the Generation rules shown on the model page. For current rates and any mode-specific differences, refer to the Generation section of the ACE-Step Audio Inpaint page on RunComfy.

Can I use ACE-Step Audio Inpaint outputs commercially?

RunComfy provides access to the ACE-Step Audio Inpaint model and the audio-to-audio workflow, but commercial usage rights for the edited audio depend on the license from the original model author and provider (acestep-ai), as well as any rights you hold over the source track you upload. Before releasing edited audio in commercial products, ads, films, or games, review the official ACE-Step license and your source-audio rights. For platform-side questions you can reach out to hi@runcomfy.com.

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

ACE-Step Audio Inpaint: Segment Audio Editing, Lyric & Style Rewrite on Models and API | RunComfy

Re-synthesize a marked slice of an audio track with new style tags or lyrics, stitched back into the original mix, on RunComfy models and HTTP API.

Introduction To ACE-Step Audio Inpaint

ACE Studio / ACE-Step Audio Inpaint#

Highlights#

Parameters#

Pricing#

Prompt & Reference Tips#

Related Models

Frequently Asked Questions

What is ACE-Step Audio Inpaint and what does it do in an audio-to-audio workflow?

What kinds of generation tasks is ACE-Step Audio Inpaint best suited for?

How does ACE-Step Audio Inpaint compare to full text-to-music or stem-based editing approaches?

Which teams and use cases benefit most from ACE-Step Audio Inpaint in production?

What input and output limits should I know before using ACE-Step Audio Inpaint?

How do I move from testing ACE-Step Audio Inpaint in the Playground to using it in production via the RunComfy API?

How is pricing handled when running ACE-Step Audio Inpaint on RunComfy?

Can I use ACE-Step Audio Inpaint outputs commercially?

ACE-Step Audio Inpaint: Segment Audio Editing, Lyric & Style Rewrite on Models and API | RunComfy

Re-synthesize a marked slice of an audio track with new style tags or lyrics, stitched back into the original mix, on RunComfy models and HTTP API.

Introduction To ACE-Step Audio Inpaint

ACE Studio / ACE-Step Audio Inpaint#

Highlights#

Parameters#

Pricing#

Prompt & Reference Tips#

Related Models

Frequently Asked Questions

What is ACE-Step Audio Inpaint and what does it do in an audio-to-audio workflow?

What kinds of generation tasks is ACE-Step Audio Inpaint best suited for?

How does ACE-Step Audio Inpaint compare to full text-to-music or stem-based editing approaches?

Which teams and use cases benefit most from ACE-Step Audio Inpaint in production?

What input and output limits should I know before using ACE-Step Audio Inpaint?

How do I move from testing ACE-Step Audio Inpaint in the Playground to using it in production via the RunComfy API?

How is pricing handled when running ACE-Step Audio Inpaint on RunComfy?

Can I use ACE-Step Audio Inpaint outputs commercially?