Create high quality videos from text prompts using Pika 2.2.
This is Kuaishou's flagship 4K-grade entry in the O3 family, tuned for final-render fidelity at the highest resolution tier. Send a single written description of the scene and the model returns a 3 to 15 second 4K clip with physics-aware motion, controlled framing, and an optional matching audio track.
It fits teams that need broadcast-grade 4K footage from natural language — no shoot day, no compositing pass, no model hosting.
| Parameter | Required | Type | Default | Range / Options | Description |
|---|---|---|---|---|---|
| prompt* | Yes (*) | string | — | Free text | Scene description covering subject, action, camera, lighting, and mood. |
| aspect_ratio | No | string | 16:9 | 16:9, 9:16, 1:1 | Output frame ratio. |
| duration | No | integer | 5 | 3 to 15 | Clip length in seconds; billing scales linearly. |
| sound | No | boolean | false | true / false | Generate matching synchronized audio with the video. |
| shot_type | No | string | customize | customize, intelligent | Editing mode; intelligent auto-decides scope, customize follows the prompt. |
Create high quality videos from text prompts using Pika 2.2.
Create camera-controlled, audio-synced clips with smooth multilingual scene flow for design pros.
Transform stills into narrative clips with synced audio and fluid camera motion.
Generate cinematic motion clips with precise control and audio sync
Convert photos into expressive talking avatars with precise motion and HD detail
LTX 2 retake video modifie the video by the prompt.
Kling Video O3 4k Text To Video is Kuaishou's flagship 4K text-to-video model, tuned for cinematic 3 to 15 second renders from a single prompt. It is a strong fit for hero brand films, premium concept reels, and large-screen spots where physics-aware motion, controlled lighting, and 4K-grade detail matter.
Kling Video O3 4k Text To Video targets the highest resolution and final-render detail in the O3 family. The Pro tier renders at a lower per-second price for HD-grade output, and the Standard tier is cheaper still for drafts and high-volume iteration, based on publicly available information.
Yes — Kling Video O3 4k Text To Video has a sound toggle that synthesizes matching ambient audio and effects in the same generation pass. Sound is off by default, and pricing stays the same whether sound is on or off.
Kling Video O3 4k Text To Video reads structured prompts well — subject, action, camera move, lighting era, and mood all influence the result. Use multi-prompt segments for scene progression and an element list to keep specific characters, props, or styles consistent across the 4K clip.
Kling Video O3 4k Text To Video supports 16:9, 9:16, and 1:1 aspect ratios for cinema, vertical social, and square placements. Clip duration can be set from 3 to 15 seconds in one-second steps. Check the current RunComfy parameter panel for the exact limits.
Only the prompt field is required for Kling Video O3 4k Text To Video; aspect_ratio, duration, sound, shot_type, multi-prompt, and element_list are optional. Please follow Kuaishou's content usage policies when crafting prompts, and check the RunComfy panel for any provider-side limits that may apply.
Yes — prototype Kling Video O3 4k Text To Video in the RunComfy model UI, then call the same model from your backend over the RunComfy HTTP API with identical parameters. No GPU hosting or model scaling work is required on your side.
Kling Video O3 4k Text To Video bills a flat $0.42 per second of generated video, regardless of whether sound is on or off. A 5-second clip costs about $2.10 and a 15-second clip about $6.30. Generations are deducted from your RunComfy usd / credit balance, and new users typically receive a free trial amount to test.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.





