Create fluid, expressive animations with multi-shot storytelling features.
Wan 2.6: High-Fidelity reference-to-video Lip-Sync & Motion Transfer | RunComfy
Wan 2.6 transforms reference videos and prompts into 1080p, 24fps clips with realistic lip-sync, multi-shot storytelling, and precise motion transfer for easy, production-ready video creation.
Introduction to Wan 2.6 Video
Alibaba's Wan 2.6 converts reference videos and prompts into 1080p, 24fps clips up to 10s with selectable 16:9, 9:16, or 1:1 ratios and precise, native audio-visual lip-sync. Trading manual frame editing, shot-by-shot storyboarding, and separate dubbing for multishot auto-story expansion and reference-accurate voice and motion transfer, Wan 2.6 streamlines production by eliminating complex masking and re-timing, built for marketing agencies, e-commerce teams, filmmakers, educators, and corporate communications. For developers, Wan 2.6 on RunComfy can be used both in the browser and via an HTTP API, so you don’t need to host or scale the model yourself.
Ideal for: Multishot Narrative Prototyping | Brand-True Product Videos with Lip-Sync | Reference-Accurate Character and Motion Transfer
Examples of Wan 2.6
















Wan 2.6 Video to Video on X: Content Drops And Insights
Wan AI / wan-2.6 Reference-to-Video
wan 2.6 reference-to-video on RunComfy is a production-grade AI video-to-video engine that takes reference videos and a descriptive prompt to generate cinematic videos with consistent motion, subject identity, and audio-visual alignment. It is designed for re-styling existing footage, motion-guided storytelling, and reference-accurate video generation.
Output format: MP4, MOV, or WebM at up to 1080p and 24 fps.
Highlights
- Reference-Driven Video Generation — Uses reference videos to preserve motion patterns, subject identity, and temporal coherence.
- Audio-Visual Consistency — Produces videos with stable visual motion and synchronized audio output when applicable.
- Multi-Shot Narrative Friendly — Supports prompt structures that describe multiple shots or scene transitions.
- High-Resolution Output — Generates up to 1080p video suitable for professional and commercial use.
- Flexible Creative Control — Combine reference inputs with textual direction to change style, mood, or environment.
- RunComfy Integration — Available via browser Models and API for scalable workflows.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
prompt* | string | Yes | Text prompt describing the desired scene, motion, and style (≤2000 characters). |
audio_url | video_uri | No | Reference video file used to guide motion and content (file size < 15 MB). |
duration | integer | No | Output video length (choices: 5, 10). |
img_url* | image_uri | Yes | First-frame reference image (supported formats: jpg, jpeg, png, bmp, webp; 360–2000 px). |
resolution | string | No | Output resolution (480P, 720P, 1080P). |
negative_prompt | string | No | Optional text to suppress unwanted styles or artifacts. |
> Required parameters are marked with *.
Pricing
Pricing on RunComfy is usage-based and depends on resolution and video length.
| Resolution | Cost |
|---|---|
| 720p+ | $0.05 per second |
| 1080p+ | $0.08 per second |
How to Use
1) Prepare Reference Inputs
- Select a short reference video that represents the desired motion or subject behavior.
- Choose a clear first-frame image for img_url.
2) Upload to RunComfy
- Upload the reference video and image in the RunComfy Models or via API.
3) Write the Prompt
- Describe what should follow the reference and what should change (style, environment, camera, mood).
4) Optional: Add a Negative Prompt
- Exclude artifacts, unwanted styles, or visual noise if needed.
5) Select Output Settings
- Choose the desired resolution and duration.
6) Generate the Video
- Run the model and preview the output directly in the Models.
7) Iterate or Download
- Refine prompts or references and regenerate, or download the final video.
Prompt & Reference Tips
- Align Reference and Goal — The closer the reference matches the target motion, the more stable the result.
- Describe Changes Explicitly — Use phrases like “follow the reference motion but change the environment to…”.
- Control Style via Prompt — Cinematic terms, lighting, and camera movement help shape the output.
- Keep Prompts Focused — Avoid conflicting instructions that may confuse motion or subject tracking.
- Use Negative Prompts Carefully — Only block clearly unwanted outcomes to avoid over-constraining the model.
More Models to Try
- Wan 2.6 Text-to-Video — Generate videos directly from text without reference footage.
- Wan 2.6 Image-to-Video — Animate a still image into a short video sequence.
Official Resources
- Wan AI official site: https://wan-ai.co/wan-2-6
How Wan 2.6 compares to other models
- Vs Wan 2.5 Generation: Compared to Wan 2.5, Wan 2.6 delivers native audio-visual synchronization with precise lip-sync, stronger reference video adherence, more stable motion, and longer coherent multi-shot clips. Ideal Use Case: Choose Wan 2.6 when speech alignment and multi-scene continuity are critical.
- Vs Wan 2.2 (open-source family): Compared to Wan 2.2, Wan 2.6 delivers higher resolution (1080p vs up to 720p), built-in audio sync/lip-sync, and improved temporal consistency for production-ready results. Ideal Use Case: Use Wan 2.6 for commercial projects needing polished audio-visual outputs without additional tooling.
- Vs Seedance 1.0 Pro: Compared to Seedance 1.0 Pro, Wan 2.6 delivers native audio-video alignment in a single pass, reducing reliance on external audio editing workflows. Ideal Use Case: Select Wan 2.6 when you need immediate lip-synced dialogue or tightly timed visuals with music.
- Vs Kling Video 2.6: Compared to Kling 2.6, Wan 2.6 delivers stronger reference video generation, narrative continuity, and multiple aspect ratio workflows, while matching on 1080p output and native A/V sync. Ideal Use Case: Pick Wan 2.6 for reference-driven storytelling and consistent brand visuals across formats.
Related Models
Refined AI visuals, real-time control, and pro FX for creators
Master complex motion, physics, and cinematic effects.
Create rapid high-quality video drafts with precise style and speed
Create camera-controlled, audio-synced clips with smooth multilingual scene flow for design pros.
Transform static visuals into cinematic motion with Kling O1's precise scene control and lifelike generation.
Frequently Asked Questions
What are the technical limitations of Wan 2.6 regarding resolution, duration, and aspect ratios?
Wan 2.6 currently outputs up to 1080p resolution at 24fps and supports multiple aspect ratios including 16:9, 9:16, and 1:1. Across its capabilities, longer durations are achieved through multi-shot narrative chaining, and the system is optimized for short-to-moderate clips typically ranging from 5 to 10 seconds per segment when applicable to motion content.
How do I transition from testing Wan 2.6 in the RunComfy to using the API in production?
To move from trial to production, prototype your Wan 2.6 prompts in the RunComfy Models, then apply the same parameters within the RunComfy API by retrieving your API access key from your dashboard. The Wan 2.6 endpoints mirror playground behavior, ensuring that production results match your web UI tests. Integration examples and rate-limit information are available in the official API documentation.
How does Wan 2.6 improve upon Wan 2.5 in terms of generation quality?
Compared to Wan 2.5, Wan 2.6 offers enhanced motion stability, stronger character consistency, better lip-sync accuracy, and native audio generation where applicable. The overall pipeline with Wan 2.6 is more stable with improved temporal coherence and finer detail handling, resulting in smoother and more realistic visual and multimodal outputs.
What makes Wan 2.6 stand out from competitors like Kling Video 2.6 or Seedance 1.0 Pro?
Wan 2.6 differentiates itself with integrated audio-visual synchronization, precise lip-sync, and reference-guided motion transfer across tasks. Unlike some competitors that may require separate audio production steps, Wan 2.6 produces synchronized audio and visuals in a single pass, making it a more integrated tool for diverse production environments.
Does Wan 2.6 support multilingual text prompts and audio generation?
Yes, Wan 2.6 supports multilingual inputs and outputs, automatically generating localized audio content and lip-synced speech across supported languages. This multilingual capability extends through its multimodal modes, enabling globalized storytelling from a single interface.
Can Wan 2.6 generate audio automatically during content creation?
Wan 2.6 includes native audio generation for voiceovers, background music, and sound effects aligned with generated visuals or other content forms. The model’s multimodal architecture ensures precise audio-video synchronization without manual sound design intervention during conversion processes.
What kind of content is Wan 2.6 best suited for?
Wan 2.6 excels in short-form visual storytelling such as ads, social media clips, explainers, product spots, and training content. Its flexible architecture allows creators to build coherent multi-sequence narratives with consistent characters, motion, and audio across scenes, ideal for marketing, e-learning, and interactive use cases.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.
