Kling O1: Precision Video-to-Video Editing & Scene Consistency

kling/kling-video-o1/video-to-video/edit

Transform existing footage into cinematic new scenes with Kling O1, a multimodal video-to-video model enabling seamless editing, consistent characters, and fast creative production for filmmakers and digital artists.

Prompt *

Use @Element1, @Element2 to reference elements and @Image1, @Image2 to reference images in order.

Video URL *

Reference video URL to guide motion and scene framing. Supported formats: .mp4, .mov. Duration: 3–10 seconds. Resolution: 720p–2160p. File size: ≤200 MB.

Elements *

Element #1

Frontal Image Url

The frontal image of the element (main view).Max file size: 10.0MB, Min width: 300px, Min height: 300px, Min aspect ratio: 0.40, Max aspect ratio: 2.50, Timeout: 20.0s

Reference Image Urls

Additional reference images from different angles. 1-4 images supported. At least one image is required.

Provide characters/objects to include. Reference in prompt as @Element1, @Element2, etc. Maximum 7 total across elements + reference images + start image.

Images *

Reference images for style or appearance guidance. Referenced in the prompt as @Image1, @Image2, etc. Up to 4 images total. When using video, elements and reference images combined must not exceed 4.

Keep Audio

Whether to keep the original audio track from the input video.

Idle

The rate is $0.168 per second.

Introduction to Kling O1 Video Generator

Developed by Kuaishou Technology, Kling O1 is a unified multimodal video foundation model that empowers creators and teams to transform existing footage through precise video-to-video generation and seamless editing. Designed for filmmakers, marketers, and digital artists, Kling O1 ensures consistent character, scene, and style across shots while dramatically accelerating creative workflows. For developers, Kling O1 on RunComfy can be used both in the browser and via an HTTP API, so you don’t need to host or scale the model yourself.

Examples Generated Using Kling O1

Model overview

Provider: Kling
Task: video-to-video
Architecture: Diffusion-based video-to-video with transformer-driven temporal attention and consistency constraints
Resolution/Specs: Reference video 3–10 s; 720p–2160p input; up to 200 MB; supports .mp4 and .mov
Key strengths:

- Strong temporal consistency across frames

- Faithful motion transfer from the reference video (camera, pose, timing)

- Style and character continuity via Elements and reference images

- Operates at high input resolutions up to 4K

- Fast iteration on RunComfy with no cold starts

Kling O1 transforms existing footage into new cinematic scenes while preserving motion and framing. It is a multimodal vid2vid model tuned for consistent characters and styles from prompts, elements, and reference images.

How Kling O1 runs on RunComfy

RunComfy makes Kling O1 production-ready with a browser playground, a simple HTTP API. You get rapid iteration, predictable performance, and no local setup.

Playground UI: Experience the model directly in your browser without installation.
Playground API: Developers can integrate Kling O1 via a scalable HTTP API at API Page

Input parameters

Below are the input parameters supported by Kling O1 on RunComfy. Use @Element1, @Element2 and @Image1, @Image2 in your prompt to reference the corresponding entries by position.

Core prompts

Parameter	Type	Default/Range	Description
prompt	string	""	Main instruction for Kling O1. Use @Element1, @Element2 to refer to entries in elements, and @Image1, @Image2 to refer to entries in image_urls (positional: first item is @Image1, etc.).
elements	string (JSON array)	""	JSON array string describing characters/objects to appear (e.g., ["astronaut", "golden retriever"]). When using video, total of elements plus image references must not exceed 4. Reference them in prompt as @ElementN.

Media references

Parameter	Type	Default/Range	Description
video_url	video uri	""	Reference video guiding motion and framing. Formats: .mp4, .mov. Duration: 3–10 s. Resolution: 720p–2160p. File size: <= 200 MB. Fails if constraints are not met.
image_urls	array of image uris	[] (0–4 items)	Up to 4 reference images for style/appearance guidance. In prompt, reference as @Image1..@ImageN by position. When using video, elements + image_urls must not exceed 4 combined.

Audio & switches

Parameter	Type	Default/Range	Description
keep_audio	boolean	false	If true, preserves the original audio track from video_url in the output. Set to false to output silent video.

Required: prompt, video_url.

Recommended settings

Because Kling O1 is video-to-video, the reference clip drives motion and pacing. For best results:

Use 3–8 s clips with steady or moderate camera motion to maximize style transfer and temporal stability.
Start with 1080p input for quality/performance balance; use 4K only when necessary due to compute costs.
Keep the cast compact: 1–3 total characters/objects in elements and 1–2 style images in image_urls. Remember elements + image_urls <= 4 when using video.
Be explicit in prompt about the desired look (lighting, lens, mood) and reference elements/images via @ElementN/@ImageN.
Enable keep_audio only when the original track is important; otherwise keep it off for faster processing pipelines.

Output quality and performance

Output: MP4 video matching the input clip duration (3–10 s). Resolution typically follows the input or internal pipeline settings.
Performance: On RunComfy with no cold starts, short 720p–1080p edits typically complete in minutes; higher resolutions or the full 10 s range will take longer. Batching and caching in the Playground help streamline iteration.

Recommended use cases

Film and episodic: Rapid previsualization, style look-dev, and consistent character re-interpretations from plates.
Advertising and social: Cinematic remixes of existing footage for multiple creative directions at speed.
Game and virtual production: Narrative teasers, stylized cutscenes, and camera-matched re-stylization.
Creator workflows: Consistent character-driven shorts derived from handheld or studio footage.

How Kling O1 compares to other models

Kling O1 vs Runway-style vid2vid tools: Kling O1 emphasizes motion-faithful transfer from the reference video and explicit character/style continuity via elements and images; competitor tools are strong generalists but may require more manual iteration for consistent character identity.
Kling O1 vs Stable Video Diffusion pipelines: Kling O1 is designed for direct video-to-video editing with prompt and reference alignment, while SVD is primarily image-to-video. For preserving camera and pose from existing footage, Kling O1 is typically the better fit; SVD excels when starting from a still.

Related Models

wan-2-2/text-to-video

Generate high quality videos from text prompts with Wan 2.2 Plus.

hailuo-02/pro/image-to-video

Animate an image into a smooth 6s video with Hailuo 02 Pro.

dreamina-3-0/image-to-video

Turn stills into cinematic motion with Dreamina 3.0's fast, precise 2K creation.

wan-2-2/vace-fun

Prompt-based animating with subject fidelity and smooth motion.

hailuo-2-3/fast/standard/image-to-video

Turn static visuals into smooth motion with Hailuo 2.3 for rapid, realistic video creation.

sora-2/image-to-video

Create lifelike scenes with synced audio and visual fidelity.

Frequently Asked Questions

Can I use Kling O1 video-to-video outputs for commercial projects?

Kling O1 allows video-to-video generation and editing under a license that typically follows an Open RAIL-style non-commercial or limited-commercial framework. Using Kling O1 through RunComfy does not override or bypass the original license. Always verify the official Kling O1 license before applying generated outputs in paid or brand-affiliated projects.

What type of license governs Kling O1 video-to-video use on RunComfy?

Kling O1 is distributed under the original license specified by Kuaishou Technology (currently aligned with an Open RAIL-like model). When using Kling O1 video-to-video capabilities via RunComfy, users must still comply with the model’s terms. RunComfy’s hosting only provides managed access and does not transfer or extend commercial rights.

How does RunComfy handle performance and latency for Kling O1 video-to-video generation?

RunComfy’s managed infrastructure distributes Kling O1 video-to-video requests across multiple cloud GPUs, ensuring low latency and stable throughput for concurrent users. Local runs of Kling O1 may require A100-class GPUs and are not recommended for high-volume workloads. The platform maintains dynamic scaling to balance efficiency and responsiveness.

Are there technical limits when using Kling O1 video-to-video features?

Yes. Kling O1 enforces maximum output resolution up to 1080p and supports video durations roughly between 3–10 seconds per generation cycle. Up to 10 reference images or short clips can be used for video-to-video consistency. Prompt token limits align with the RunComfy API cap, which currently allows around 1000 characters per request.

How can I transition my Kling O1 video-to-video experiments from the Playground to API production?

To migrate Kling O1 video-to-video workflows, first finalize prototype results in the RunComfy Playground. Afterward, obtain an API key and replicate your configuration via the RunComfy REST or Python interface. The API offers the same output fidelity as the web interface but allows integration into scripts, CMS pipelines, or app backends.

What distinguishes Kling O1 video-to-video from earlier models?

Kling O1 unifies generation and editing within one multimodal engine, improving consistency of characters and scenes. Compared with prior models, Kling O1 video-to-video excels at handling scene continuity, start/end-frame control, and reference-based identity preservation. This reduces content drift often seen in earlier text-to-video systems.

Can I run Kling O1 video-to-video locally instead of through RunComfy?

Technically yes, but running Kling O1 video-to-video locally demands high-end GPUs (A100/RTX 4090 or higher) and substantial VRAM. RunComfy’s managed environment handles GPU provisioning, batching, and automatic checkpoint updates, making it more efficient and reliable for most users.

Does RunComfy’s execution of Kling O1 video-to-video consume cloud credits or local resources?

All Kling O1 video-to-video generations on RunComfy are processed using cloud GPUs. Users spend platform ‘usd’ credits per render. New accounts receive trial usd, which can be replenished through the billing menu. No local hardware resources are consumed when using the web or API services.

What should I do if Kling O1 video-to-video generation fails or times out?

If Kling O1 video-to-video generation fails, verify prompt complexity, reduce reference inputs, and ensure network stability. Occasionally high busyness or quota limits may cause task timeouts. Contact hi@runcomfy.com for assistance, providing your task ID and configuration for support escalation.

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Kling O1: Precision Video-to-Video Editing & Scene Consistency | RunComfy