Unified AI model for refined scene editing, style match, and smooth video refits
Kling O1 Reference Image to Video: Image-to-Video with Motion Fidelity on playground and API | RunComfy
Generate cinematic videos from images or text using reference footage, preserving motion style, camera angles, and scene continuity for unified, high-fidelity visual storytelling.
Introduction to Kling O1 Reference to Video
As part of the Kling O1 unified AI model, this image-to-video system lets you generate fresh cinematic sequences guided by your chosen reference video. It retains motion style, camera perspective, and subject continuity while letting you mix text, images, and footage within one workflow. Built with a multimodal visual language core, Kling O1 Reference to Video merges generation and editing, ensuring more consistent frames, realistic motion, and intelligent visual reasoning for authentic storytelling across industries.
Kling O1 Reference to Video image-to-video helps you craft new shots that extend your visuals, preserve style, or refine branding. Ideal for filmmakers, advertisers, and creators, it produces seamless scenes that match your references, offering unified control, fidelity, and cinematic precision.
Examples of Kling O1 Reference to Video






Key capabilities:
- Structure-true motion transfer: Kling O1 Reference to Video retains pose, depth, and spatial relationships while applying reference movement.
- Camera and pacing fidelity: Preserves dolly, pan, tilt, and cadence for authentic cinematic flow.
- Multi-reference conditioning: Blends multiple images to stabilize identity and style.
- Scene continuity control: Reduces temporal artifacts and maintains lighting plausibility through the clip.
- Framing flexibility: supports 16:9, 9:16, and 1:1 for platform-specific delivery.
- Deterministic constraints: fixed 5 s or 10 s runs keep planning and review cycles tight.
Prompting guide for Kling O1 Reference to Video
Begin with a crisp still image that defines subject and composition. Supply additional reference images via image_urls and describe motion, camera path, and timing in the prompt using explicit tokens like @Image1 or @Element1. Kling O1 Reference to Video interprets references in order, so map identities clearly and state what must remain unchanged. Choose duration 5 or 10 and set aspect_ratio to match target delivery. Kling O1 Reference to Video benefits from concrete verbs for motion and clear spatial anchors to avoid ambiguity.
Examples:
- Single image to motion: "Use @Image1 as subject; emulate gentle handheld sway; keep framing medium shot; duration 5; aspect_ratio 16:9."
- Camera reprise from reference: "Track @Image1 with a slow push-in; retain eye contact; subtle parallax; duration 10; aspect_ratio 9:16." Kling O1 Reference to Video couples camera speed with stable pose.
- Identity plus style mix: "@Image1 subject, @Image2 color grade; follow lateral pan; preserve outfit and hair; duration 5; 1:1." Kling O1 Reference to Video balances look transfer with geometry.
- Background-driven motion: "Keep subject from @Image1 static; animate only background with citylight bokeh drift; duration 5; 16:9."
- Elements JSON: define @Element1 and @Element2 in elements, then "Frame @Element1 center; @Element2 passes left-to-right; maintain fixed horizon; duration 10." Kling O1 Reference to Video maps element order to prompt references.
Pro tips:
- Reference in order and keep naming consistent: @Image1, @Image2, @Element1, @Element2.
- Constrain scope: say what to preserve (subject, pose, wardrobe) and what to change (camera, background, speed).
- Use spatial language: left, right, foreground, background, upper-third, eye level.
- Keep descriptors decisive: prefer a few strong terms over many competing adjectives.
- Respect limits: maximum 7 total across elements, reference images, and start image for stable conditioning.
Note: Try the model in the RunComfy playground for video-to-video: Kling O1 Video Edit.
Related Playgrounds
Generate cinematic video from images with 4K detail, fluid motion, and audio sync.
Turn still portraits into expressive, lifelike videos with control and precision.
Realistic motion, dynamic camerawork, and improved physics.
Generate budget-friendly videos from text prompts with Seedance Lite.
Animate a single image into a smooth video with Kling 2.1 Pro.
Frequently Asked Questions
What exactly is Kling O1 Reference to Video in the context of image-to-video generation?
Kling O1 Reference to Video is a specialized mode of the Kling O1 multimodal AI model that enables creators to generate new cinematic shots based on a short reference video. It uses advanced image-to-video processing to preserve motion, continuity, and camera style while extending or transforming scenes.
How does Kling O1 Reference to Video handle image-to-video input for new shot generation?
Kling O1 Reference to Video allows users to upload a 3–10 second reference clip along with optional images and text prompts. Through its unified image-to-video pipeline, it produces consistent new sequences that match the visual and motion patterns of the input material.
What are the main capabilities of Kling O1 Reference to Video?
Kling O1 Reference to Video supports multimodal editing, enabling users to insert subjects, apply style changes, or generate next-shot continuity. Its image-to-video capabilities include preserving camera movement, keeping audio if desired, and maintaining character consistency within generated footage.
How much does it cost to use Kling O1 Reference to Video on Runcomfy?
Access to Kling O1 Reference to Video requires using credits on Runcomfy’s AI playground. New users typically receive free credits to try the image-to-video model, after which usage depends on the platform’s credit policy listed under the Generation section.
Who can benefit the most from using Kling O1 Reference to Video and its image-to-video features?
Kling O1 Reference to Video is ideal for filmmakers, advertisers, social media creators, and design studios seeking to produce consistent or extended shots from reference material. The model’s image-to-video generation is particularly useful for maintaining quality continuity in sequences or campaigns.
What makes Kling O1 Reference to Video different from earlier Kling versions or competitors?
Unlike prior versions that separated text-to-video and image-to-video tasks, Kling O1 Reference to Video uses a unified multimodal model that supports seamless editing and generation in one workflow. It offers better motion accuracy, subject fidelity, and camera-style preservation than most competing systems.
What formats and resolutions does Kling O1 Reference to Video support for image-to-video processing?
Kling O1 Reference to Video accepts .mp4 or .mov files as input, ranging from 720 to 2160 pixels, with a maximum size of 200MB. This flexibility ensures that image-to-video tasks maintain high resolution and efficient rendering for cinematic output.
Can the generated clips from Kling O1 Reference to Video include original audio?
Yes, Kling O1 Reference to Video offers an option to retain the original audio from the reference video. This feature enhances the realism of its image-to-video results and is popular for creative continuity in storytelling and advertising projects.
Are there any limitations or caveats when using Kling O1 Reference to Video?
Kling O1 Reference to Video works best with short, high-quality clips of 3–10 seconds. While it excels in image-to-video sequence generation, complex scenes with many unsynced visual elements may require multiple reference inputs for optimal consistency.
Where and how can I access Kling O1 Reference to Video?
Kling O1 Reference to Video is available on Runcomfy’s AI playground website, which supports both desktop and mobile browsers. Users can start image-to-video projects after logging in and allocating their platform credits accordingly.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.
