Kling O1 Reference Image to Video: Image-to-Video with Motion Fidelity

kling/kling-video-o1/image-to-video/reference

Generate cinematic videos from images or text using reference footage, preserving motion style, camera angles, and scene continuity for unified, high-fidelity visual storytelling.

Prompt *

Reference elements as @Element1, @Element2 and images as @Image1, @Image2 in order. Spell these elements words strictly as they are.

Reference Images *

Additional reference images for style/appearance. Reference in prompt as @Image1, @Image2, etc. Maximum 7 total across elements + reference images + start image.

Elements *

Element #1

Frontal Image Url

The frontal image of the element (main view).Max file size: 10.0MB, Min width: 300px, Min height: 300px, Min aspect ratio: 0.40, Max aspect ratio: 2.50, Timeout: 20.0s

Reference Image Urls

Additional reference images from different angles. 1-4 images supported. At least one image is required.

Provide characters/objects to include. Reference in prompt as @Element1, @Element2, etc. Maximum 7 total across elements + reference images + start image.

Duration (seconds)

Video duration in seconds. Only 5 and 10 are supported.

Aspect Ratio (W:H)

The aspect ratio of the generated video frame.

Idle

The rate is $0.112 per second.

Introduction to Kling O1 Reference to Video

As part of the Kling O1 unified AI model, this image-to-video system lets you generate fresh cinematic sequences guided by your chosen reference video. It retains motion style, camera perspective, and subject continuity while letting you mix text, images, and footage within one workflow. Built with a multimodal visual language core, Kling O1 Reference to Video merges generation and editing, ensuring more consistent frames, realistic motion, and intelligent visual reasoning for authentic storytelling across industries.
Kling O1 Reference to Video image-to-video helps you craft new shots that extend your visuals, preserve style, or refine branding. Ideal for filmmakers, advertisers, and creators, it produces seamless scenes that match your references, offering unified control, fidelity, and cinematic precision.

Key capabilities:#

Structure-true motion transfer: Kling O1 Reference to Video retains pose, depth, and spatial relationships while applying reference movement.
Camera and pacing fidelity: Preserves dolly, pan, tilt, and cadence for authentic cinematic flow.
Multi-reference conditioning: Blends multiple images to stabilize identity and style.
Scene continuity control: Reduces temporal artifacts and maintains lighting plausibility through the clip.
Framing flexibility: supports 16:9, 9:16, and 1:1 for platform-specific delivery.
Deterministic constraints: fixed 5 s or 10 s runs keep planning and review cycles tight.

Prompting guide for Kling O1 Reference to Video#

Begin with a crisp still image that defines subject and composition. Supply additional reference images via image_urls and describe motion, camera path, and timing in the prompt using explicit tokens like @Image1 or @Element1. Kling O1 Reference to Video interprets references in order, so map identities clearly and state what must remain unchanged. Choose duration 5 or 10 and set aspect_ratio to match target delivery. Kling O1 Reference to Video benefits from concrete verbs for motion and clear spatial anchors to avoid ambiguity.

Examples:

Single image to motion: "Use @Image1 as subject; emulate gentle handheld sway; keep framing medium shot; duration 5; aspect_ratio 16:9."
Camera reprise from reference: "Track @Image1 with a slow push-in; retain eye contact; subtle parallax; duration 10; aspect_ratio 9:16." Kling O1 Reference to Video couples camera speed with stable pose.
Identity plus style mix: "@Image1 subject, @Image2 color grade; follow lateral pan; preserve outfit and hair; duration 5; 1:1." Kling O1 Reference to Video balances look transfer with geometry.
Background-driven motion: "Keep subject from @Image1 static; animate only background with citylight bokeh drift; duration 5; 16:9."
Elements JSON: define @Element1 and @Element2 in elements, then "Frame @Element1 center; @Element2 passes left-to-right; maintain fixed horizon; duration 10." Kling O1 Reference to Video maps element order to prompt references.

Pro tips:

Reference in order and keep naming consistent: @Image1, @Image2, @Element1, @Element2.
Constrain scope: say what to preserve (subject, pose, wardrobe) and what to change (camera, background, speed).
Use spatial language: left, right, foreground, background, upper-third, eye level.
Keep descriptors decisive: prefer a few strong terms over many competing adjectives.
Respect limits: maximum 7 total across elements, reference images, and start image for stable conditioning.

Note: Try the model in the RunComfy playground for video-to-video: Kling O1 Video Edit.

Related Models

seedance-2.0/pro

Create 2K cinematic clips with precise lip-sync and camera control

wan-2-1/fusionx/image-to-video

Cinema-grade AI videos with precise dual-prompt control

sam-3/video-to-video

Empowers precise tracking and seamless object edits across video scenes.

ai-avatar/v2/standard

Convert photos into expressive talking avatars with precise motion and HD detail

infinite-talk/fast/multi

Transform speech into lifelike video avatars with expressive, synced motion.

wan-2-2/lora/text-to-video

Use WAN 2.2 LoRA as latest AI tool for realistic video creation from text.

Frequently Asked Questions

What exactly is Kling O1 Reference to Video in the context of image-to-video generation?

Kling O1 Reference to Video is a specialized mode of the Kling O1 multimodal AI model that enables creators to generate new cinematic shots based on a short reference video. It uses advanced image-to-video processing to preserve motion, continuity, and camera style while extending or transforming scenes.

How does Kling O1 Reference to Video handle image-to-video input for new shot generation?

Kling O1 Reference to Video allows users to upload a 3–10 second reference clip along with optional images and text prompts. Through its unified image-to-video pipeline, it produces consistent new sequences that match the visual and motion patterns of the input material.

What are the main capabilities of Kling O1 Reference to Video?

Kling O1 Reference to Video supports multimodal editing, enabling users to insert subjects, apply style changes, or generate next-shot continuity. Its image-to-video capabilities include preserving camera movement, keeping audio if desired, and maintaining character consistency within generated footage.

How much does it cost to use Kling O1 Reference to Video on Runcomfy?

Access to Kling O1 Reference to Video requires using credits on Runcomfy’s AI playground. New users typically receive free credits to try the image-to-video model, after which usage depends on the platform’s credit policy listed under the Generation section.

Who can benefit the most from using Kling O1 Reference to Video and its image-to-video features?

Kling O1 Reference to Video is ideal for filmmakers, advertisers, social media creators, and design studios seeking to produce consistent or extended shots from reference material. The model’s image-to-video generation is particularly useful for maintaining quality continuity in sequences or campaigns.

What makes Kling O1 Reference to Video different from earlier Kling versions or competitors?

Unlike prior versions that separated text-to-video and image-to-video tasks, Kling O1 Reference to Video uses a unified multimodal model that supports seamless editing and generation in one workflow. It offers better motion accuracy, subject fidelity, and camera-style preservation than most competing systems.

What formats and resolutions does Kling O1 Reference to Video support for image-to-video processing?

Kling O1 Reference to Video accepts .mp4 or .mov files as input, ranging from 720 to 2160 pixels, with a maximum size of 200MB. This flexibility ensures that image-to-video tasks maintain high resolution and efficient rendering for cinematic output.

Can the generated clips from Kling O1 Reference to Video include original audio?

Yes, Kling O1 Reference to Video offers an option to retain the original audio from the reference video. This feature enhances the realism of its image-to-video results and is popular for creative continuity in storytelling and advertising projects.

Are there any limitations or caveats when using Kling O1 Reference to Video?

Kling O1 Reference to Video works best with short, high-quality clips of 3–10 seconds. While it excels in image-to-video sequence generation, complex scenes with many unsynced visual elements may require multiple reference inputs for optimal consistency.

Where and how can I access Kling O1 Reference to Video?

Kling O1 Reference to Video is available on Runcomfy’s AI playground website, which supports both desktop and mobile browsers. Users can start image-to-video projects after logging in and allocating their platform credits accordingly.

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Key capabilities:#

Structure-true motion transfer: Kling O1 Reference to Video retains pose, depth, and spatial relationships while applying reference movement.

Camera and pacing fidelity: Preserves dolly, pan, tilt, and cadence for authentic cinematic flow.

Multi-reference conditioning: Blends multiple images to stabilize identity and style.

Scene continuity control: Reduces temporal artifacts and maintains lighting plausibility through the clip.

Framing flexibility: supports 16:9, 9:16, and 1:1 for platform-specific delivery.

Deterministic constraints: fixed 5 s or 10 s runs keep planning and review cycles tight.

Prompting guide for Kling O1 Reference to Video#

Examples:

Single image to motion: "Use @Image1 as subject; emulate gentle handheld sway; keep framing medium shot; duration 5; aspect_ratio 16:9."

Camera reprise from reference: "Track @Image1 with a slow push-in; retain eye contact; subtle parallax; duration 10; aspect_ratio 9:16." Kling O1 Reference to Video couples camera speed with stable pose.

Identity plus style mix: "@Image1 subject, @Image2 color grade; follow lateral pan; preserve outfit and hair; duration 5; 1:1." Kling O1 Reference to Video balances look transfer with geometry.

Background-driven motion: "Keep subject from @Image1 static; animate only background with citylight bokeh drift; duration 5; 16:9."

Elements JSON: define @Element1 and @Element2 in elements, then "Frame @Element1 center; @Element2 passes left-to-right; maintain fixed horizon; duration 10." Kling O1 Reference to Video maps element order to prompt references.

Pro tips:

Reference in order and keep naming consistent: @Image1, @Image2, @Element1, @Element2.

Constrain scope: say what to preserve (subject, pose, wardrobe) and what to change (camera, background, speed).

Use spatial language: left, right, foreground, background, upper-third, eye level.

Keep descriptors decisive: prefer a few strong terms over many competing adjectives.

Respect limits: maximum 7 total across elements, reference images, and start image for stable conditioning.

Note: Try the model in the RunComfy playground for video-to-video: Kling O1 Video Edit.

Frequently Asked Questions

Generate cinematic videos from images or text using reference footage, preserving motion style, camera angles, and scene continuity for unified, high-fidelity visual storytelling.

Introduction to Kling O1 Reference to Video

Key capabilities:#

Prompting guide for Kling O1 Reference to Video#

Related Models

Frequently Asked Questions

What exactly is Kling O1 Reference to Video in the context of image-to-video generation?

How does Kling O1 Reference to Video handle image-to-video input for new shot generation?

What are the main capabilities of Kling O1 Reference to Video?

How much does it cost to use Kling O1 Reference to Video on Runcomfy?

Who can benefit the most from using Kling O1 Reference to Video and its image-to-video features?

What makes Kling O1 Reference to Video different from earlier Kling versions or competitors?

What formats and resolutions does Kling O1 Reference to Video support for image-to-video processing?

Can the generated clips from Kling O1 Reference to Video include original audio?

Are there any limitations or caveats when using Kling O1 Reference to Video?

Where and how can I access Kling O1 Reference to Video?

Generate cinematic videos from images or text using reference footage, preserving motion style, camera angles, and scene continuity for unified, high-fidelity visual storytelling.

Introduction to Kling O1 Reference to Video

Examples of Kling O1 Reference to Video

Key capabilities:#

Prompting guide for Kling O1 Reference to Video#

Related Models

Frequently Asked Questions

What exactly is Kling O1 Reference to Video in the context of image-to-video generation?

How does Kling O1 Reference to Video handle image-to-video input for new shot generation?

What are the main capabilities of Kling O1 Reference to Video?

How much does it cost to use Kling O1 Reference to Video on Runcomfy?

Who can benefit the most from using Kling O1 Reference to Video and its image-to-video features?

What makes Kling O1 Reference to Video different from earlier Kling versions or competitors?

What formats and resolutions does Kling O1 Reference to Video support for image-to-video processing?

Can the generated clips from Kling O1 Reference to Video include original audio?

Are there any limitations or caveats when using Kling O1 Reference to Video?

Where and how can I access Kling O1 Reference to Video?

Examples of Kling O1 Reference to Video

Kling O1 Reference Image to Video: Image-to-Video with Motion Fidelity | RunComfy

Generate cinematic videos from images or text using reference footage, preserving motion style, camera angles, and scene continuity for unified, high-fidelity visual storytelling.

Introduction to Kling O1 Reference to Video

Key capabilities:#

Prompting guide for Kling O1 Reference to Video#

Related Models

Frequently Asked Questions

What exactly is Kling O1 Reference to Video in the context of image-to-video generation?

How does Kling O1 Reference to Video handle image-to-video input for new shot generation?

What are the main capabilities of Kling O1 Reference to Video?

How much does it cost to use Kling O1 Reference to Video on Runcomfy?

Who can benefit the most from using Kling O1 Reference to Video and its image-to-video features?

What makes Kling O1 Reference to Video different from earlier Kling versions or competitors?

What formats and resolutions does Kling O1 Reference to Video support for image-to-video processing?

Can the generated clips from Kling O1 Reference to Video include original audio?

Are there any limitations or caveats when using Kling O1 Reference to Video?

Where and how can I access Kling O1 Reference to Video?

Kling O1 Reference Image to Video: Image-to-Video with Motion Fidelity | RunComfy

Generate cinematic videos from images or text using reference footage, preserving motion style, camera angles, and scene continuity for unified, high-fidelity visual storytelling.

Introduction to Kling O1 Reference to Video

Examples of Kling O1 Reference to Video

Key capabilities:#

Prompting guide for Kling O1 Reference to Video#

Related Models

Frequently Asked Questions

What exactly is Kling O1 Reference to Video in the context of image-to-video generation?

How does Kling O1 Reference to Video handle image-to-video input for new shot generation?

What are the main capabilities of Kling O1 Reference to Video?

How much does it cost to use Kling O1 Reference to Video on Runcomfy?

Who can benefit the most from using Kling O1 Reference to Video and its image-to-video features?

What makes Kling O1 Reference to Video different from earlier Kling versions or competitors?

What formats and resolutions does Kling O1 Reference to Video support for image-to-video processing?

Can the generated clips from Kling O1 Reference to Video include original audio?

Are there any limitations or caveats when using Kling O1 Reference to Video?

Where and how can I access Kling O1 Reference to Video?

Examples of Kling O1 Reference to Video