Add a person or object into an existing video with smart compositing.
This is Kuaishou's O3-family text-to-video model, tuned for cinematic motion, stable subjects, and grounded physics from a single prompt. The Standard tier hits a practical price point while preserving the O3 look, and it ships with optional sound generation so a clip can leave the model already mixed.
It fits teams that need short, on-brief shots fast — without a camera crew, an edit bay, or a license search.
| Parameter | Required | Type | Default | Range / Options | Description |
|---|---|---|---|---|---|
| prompt* | Yes (*) | string | — | Free text | Scene description covering subject, motion, camera, and atmosphere. |
| aspect_ratio | No | string | 16:9 | 16:9, 9:16, 1:1 | Frame ratio of the output. |
| duration | No | integer | 5 | 3 – 15 | Clip length in seconds. |
| sound | No | boolean | false | true / false | Generate synchronized sound with the video. |
| shot_type | No | string | customize | customize, intelligent | Editing mode; intelligent auto-decides scope, customize follows the prompt. |
| multi_prompt | No | array | — | 0 – 20 items | Additional prompt segments to guide scene transitions; segment durations must sum to total video duration. |
The Standard tier bills per second of generated video on RunComfy. Enabling sound adds roughly a 33% surcharge on top of the silent rate.
| Output | Rate per second |
|---|---|
| Video without sound | $0.084 |
| Video with sound | $0.112 |
Estimated cost examples
| Duration | Without sound | With sound |
|---|---|---|
| 3 s | ~$0.252 | ~$0.336 |
| 5 s (default) | ~$0.420 | ~$0.560 |
| 10 s | ~$0.840 | ~$1.120 |
| 15 s | ~$1.260 | ~$1.680 |
1) Open the Kling Video O3 model on RunComfy and reveal the generation panel.
2) Write a prompt that names the subject, action, environment, camera move, and lighting in shot language.
3) Pick an aspect ratio that matches the delivery surface — 16:9 for YouTube, 9:16 for Reels and Shorts, 1:1 for feeds.
4) Set duration between 3 and 15 seconds; start at 3-5s while iterating on the prompt, then extend for the final render.
5) Toggle sound on only when you actually need synchronized audio, since it raises the per-second cost.
6) Choose shot_type — customize for tight prompt control, intelligent when you want the model to handle pacing.
7) Run the generation, preview the result, and download the clip from your job history.
8) For automation, send the same fields to the RunComfy API endpoint and integrate the output URL into your pipeline.
Add a person or object into an existing video with smart compositing.
Create 1080p clips with multi-reference and frame control.
Transform images into motion-rich clips with Hailuo 2.3's precise control and realistic visuals.
Features smooth scene transitions, natural cuts, and consistent motion.
Create multi-scene films with synced dialogue and consistent characters.
Transform still images and voice tracks into lifelike talking avatars with precise motion control.
Kling Video O3 is Kuaishou's O3-family text-to-video model that turns a single prompt into a cinematic clip with stable subjects, natural motion, and grounded physics. In a text-to-video workflow on RunComfy, you describe the scene, camera, and atmosphere, and the model returns a finished video — optionally with synchronized sound. It is built for creators who want fast, prompt-driven generation without manual editing or a camera crew.
Kling Video O3 is best suited for short cinematic shots: social reels, ad creatives, brand stingers, product visualizations, and concept exploration for design and storyboarding. Flexible 3–15 second durations and 16:9, 9:16, and 1:1 aspect ratios make it easy to deliver to TikTok, Reels, YouTube, and feed placements from one model. The optional sound switch also covers cases where you need a clip that leaves the model already mixed.
Compared to the O3 Pro tier, Kling Video O3 Standard targets a more accessible per-second price while keeping the O3 look, which is helpful when you need to iterate or generate at higher volume. Compared to earlier Kling versions such as the V3.0 line, the O3 family is positioned around improved motion realism and subject consistency based on available provider information. The control surface — prompt, aspect ratio, duration, sound, shot type — stays familiar so existing prompts transfer easily.
Marketing teams, video editors, social creators, and product designers can use Kling Video O3 to produce on-brief promo clips, animated mood boards, and short-form social videos without booking talent or sourcing stock footage. Developers can wrap the same model into automated pipelines that turn brief metadata into ready-to-publish video assets. For storyboarders and concept artists, the model also speeds up visualizing scenes that would otherwise need a previs pass.
Kling Video O3 takes a required text prompt and supports aspect ratios of 16:9, 9:16, and 1:1, with whole-second durations from 3 to 15 seconds. Sound can be toggled on or off, and shot_type accepts customize or intelligent for editing scope. For any other constraints such as prompt length, multi-prompt segment counts, or render concurrency, check the current RunComfy parameter panel for the exact limits, since they may vary by mode or provider settings.
Yes. You can prototype Kling Video O3 in the RunComfy AI Playground Web UI, dialing in the prompt, aspect ratio, duration, sound, and shot type until the output matches your target. Once the configuration is stable, call the same model via the RunComfy API with identical parameters from your backend, content pipeline, or CMS to automate generation. This keeps creative iteration in the browser while production runs in code, without changing how the model behaves.
Kling Video O3 generations consume usd / credits from your RunComfy balance, and based on available provider information the Standard tier is billed at $0.084 per second without sound and $0.112 per second with sound — roughly a 33% surcharge for enabling audio. A default 5-second silent clip therefore lands around $0.42, while a 15-second clip with sound is around $1.68. New users typically get a free trial usd amount to experiment; refer to the Generation section of the model page for the most current rates.
Kling Video O3 responds well to concise, prioritized prompts written in shot language — lead with the camera move and shot type, then name the subject, action, environment, lighting, and time of day. Specific cues like "slow push-in", "golden hour rim light", or "handheld with slight sway" help anchor the motion and look more than long adjective chains. For 10–15 second clips, describe one clear motion arc instead of several unrelated actions to keep the result coherent.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.





