First-frame restyle locks cinematic look across full AI video.
Elevenlabs Music is a text-to-music generation model that turns natural-language prompts and optional structured lyrics into high-fidelity songs, with or without vocals. The model accepts detailed style and section descriptions, then outputs polished audio suitable for media, games, and creative production.
Output format: Audio only / duration ~3 seconds to ~5 minutes / 44.1 kHz / stereo / MP3 or WAV
Use these fields to steer Elevenlabs Music reliably.
| Parameter | Required | Type | Default | Range / Options | Description |
|---|---|---|---|---|---|
| prompt* | Yes (*) | string | — | — | Style description and lyrics with structure markers. |
| music_length_ms | No | integer | 40000 | Typically 3000–300000 | Output duration in milliseconds. |
| force_instrumental | No | boolean | disabled | true/false | Generate instrumental only without vocals. |
| output_format | No | string | mp3_standard | See platform options | Output format (default: mp3_standard; other formats as listed in the platform). |
Elevenlabs Music on RunComfy uses time-based billing for generated audio.
| Billing unit | Rate |
|---|---|
| Per second of generated audio | $0.0083 |
Estimated cost examples
| Duration | Approx. cost |
|---|---|
| 30 s | ~$0.249 |
| 40 s (default) | ~$0.332 |
| 60 s | ~$0.498 |
| 120 s | ~$0.996 |
| 300 s (5 min) | ~$2.49 |
1) Select the model on RunComfy and open the generation panel for Elevenlabs Music.
2) Write your prompt with structure markers (e.g., “Intro 8 bars, Verse 16 bars, Chorus 16 bars”) and include lyrics if you want vocals.
3) Set music_length_ms to match your plan (e.g., 40000 for ~40 seconds); start short to iterate faster, then extend.
4) Enable force_instrumental if you need a vocal-free bed; otherwise leave it off for sung lyrics.
5) Choose output_format based on workflow needs (MP3 for speed, WAV for mixing) as available on the platform.
6) Generate and review; if a section needs changes, adjust that portion of the prompt and rerun rather than rewriting the whole song.
7) For API use on RunComfy, send the same parameters; no self-hosting is required and results are downloadable from your job history.
8) Save variants that nail different sections; you can later splice or compare versions to converge on the best take.
On RunComfy, Elevenlabs Music offers fast iteration with detailed structure control for professional music workflows.
First-frame restyle locks cinematic look across full AI video.
Create 1080p clips with multi-reference and frame control.
AI-powered tool for fast video-to-video backdrop swaps with pro-level precision.
Seamlessly lengthen shots with frame-consistent context control and audio blending for refined video creation.
Pro-tier image animation: 3-15s cinematic clips from $0.112 per second.
Text-driven video transformation keeping motion and style consistent across edits.
ElevenLabs Music is a text-to-sound model that generates full-length songs or instrumental tracks from natural language prompts. Compared to other systems, it offers detailed section-by-section control, multilingual vocals, and customizable fine-tuning. This allows users to create professional-grade tracks using the ElevenLabs Music interface or the RunComfy API.
ElevenLabs Music supports nearly all musical genres — from classical to hip-hop — and can generate vocals in multiple languages such as English, Spanish, German, and Japanese. The text-to-sound architecture is designed to interpret stylistic and emotional cues across languages, enabling global accessibility for content creators.
Currently, ElevenLabs Music text-to-sound generation supports tracks from about 3 seconds up to 5 minutes in duration. Outputs are capped at studio-quality 44.1 kHz MP3 or WAV. Each prompt can include structured sections, and the total text prompt length is optimized for up to a few thousand tokens for best performance.
Yes, users can fine-tune ElevenLabs Music with non-copyrighted reference material only. The fine-tune feature allows limited custom datasets—typically 5 to 10 short audio references—to train a personalized model. Oversized audio or copyrighted references may be rejected by the ElevenLabs Music servers for compliance and performance reasons.
To transition from prototyping in the RunComfy Models to production, developers can use the RunComfy API endpoints that mirror Models parameters. After validating generation quality in ElevenLabs Music's text-to-sound experiments, you can authenticate via API keys, automate jobs, and manage billing with usd credits for scalable deployment.
Yes, in most cases ElevenLabs Music provides commercial rights for generated tracks. However, users should confirm the exact terms directly on elevenlabs.io before redistribution. The text-to-sound engine ensures outputs are original and cleared for common use cases, but proper licensing review is recommended before large-scale deployment.
ElevenLabs Music uses a next-generation text-to-sound synthesis engine delivering consistent structural logic, improved mixing accuracy, and high-fidelity audio at 44.1 kHz. Earlier models often lacked dynamic range or structural cohesion, while ElevenLabs Music reproduces nuanced transitions and realistic vocals without noticeable artifacts.
ElevenLabs Music’s text-to-sound model offers more granular structure control and prompt adherence than Suno, and more flexible editing than Udio. While Udio excels in vocal timbre realism, ElevenLabs Music provides richer per-section control and transparent licensing, helping developers and artists maintain creative and legal precision.
Yes, the ElevenLabs Music text-to-sound API supports batch or queued generation through RunComfy’s endpoint architecture. Developers can iterate on prompts, adjust section transitions, and collect outputs programmatically in MP3 or WAV, streamlining soundtrack creation for apps, virtual worlds, or dynamic media pipelines.
Yes. ElevenLabs Music includes section-aware editing so users can regenerate individual segments—like chorus or bridge—without affecting the rest of the composition. This fine-grained text-to-sound control supports iterative creative workflows for both developers and technical artists seeking efficiency and precision.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.