ComfyUI>Workflows>Wan2.2 Animate | Photo to Realistic Motion Video

Wan2.2 Animate | Photo to Realistic Motion Video

Workflow Name: RunComfy/Wan2.2-Animate

Workflow ID: 0000...1292

This workflow helps you animate static images into complete motion videos that preserve character identity. By combining body pose transfer and facial mocap, it produces natural movement and expressive realism. You can take a driving video and a reference image to create lifelike character animations. It's especially useful for generating avatars, recreating performances, or storytelling projects. The workflow ensures seamless synchronization between reference identity and dynamic movements. With precise facial expressions and smooth body actions, outputs feel true to life. The process is efficient, creative, and designed for high-quality results.

Wan2.2 Animate: full‑motion reference‑to‑video animation in ComfyUI

Wan2.2 Animate turns a single reference image into a lifelike performance that follows a driving video’s full‑body motion and facial expressions. This ComfyUI Wan2.2 Animate workflow fuses pose transfer, face mocap, background control, and LoRA add‑ons so characters move naturally while identity stays intact.

Designed for avatars, performance re‑creations, music videos, and story beats, Wan2.2 Animate produces clean, temporally stable clips with optional audio passthrough, quality upscaling, and interpolation. It ships as a guided graph with sensible defaults, so you can focus on creative choices rather than plumbing.

Key models in Comfyui Wan2.2 Animate workflow

Wan 2.2 Animate 14B (I2V) fp8 scaled. The core video model that interprets pose, face, image, and text guidance to synthesize the motion track with identity preservation. Model set
Wan 2.1 VAE bf16. The matching VAE used to encode/decode latents for the Wan family, ensuring color fidelity and sharpness. VAE
UMT5‑XXL text encoder. Provides robust multilingual text conditioning for positive and negative prompts. Encoder
CLIP ViT‑H/14 vision encoder. Extracts visual embeddings from the reference image to preserve identity and style. Paper
Optional Wan LoRAs. Lightweight adapters for lighting and I2V behavior control, such as Lightx2v I2V 14B and Relight. Lightx2v • Relight
Segment Anything 2 (SAM 2). High‑quality image/video segmentation used to isolate the subject or background. Paper
DWPose. Accurate 2D pose estimation used for face/pose‑aware crops and masks. Repo
RIFE. Fast video frame interpolation to boost playback smoothness. Paper

How to use Comfyui Wan2.2 Animate workflow

Overall flow. The graph ingests a driving video and a single reference image, prepares a clean subject/background and a face‑aware crop, then feeds pose, face, image, and text embeds into Wan2.2 Animate for sampling and decode. A final stage upscales details and optionally interpolates frames before export.

Models
- This group loads the Wan2.2 Animate base, matching VAE, text/vision encoders, and any selected LoRAs. The WanVideoModelLoader (#22) and WanVideoSetLoRAs (#48) wire the model and adapters, while WanVideoVAELoader (#38) and CLIPLoader (#175) provide VAE and text backbones.
- If you plan to adjust LoRAs (e.g., relight or I2V style), keep only one or two active at a time to avoid conflicts, then preview with the provided collage nodes.

Size

Set your target width and height in the size group and confirm the frame_count matches the frames you plan to load from the driving video. VHS_LoadVideo (#63) reports the count; keep the sampler’s num_frames consistent to avoid tail truncation.
The PixelPerfectResolution (#152) helper reads the driving clip to suggest stable generation sizing.

Background Masking

Load your driving video in VHS_LoadVideo (#63); audio is extracted automatically for later passthrough. Use PointsEditor (#107) to place a few positive points on the subject and run Sam2Segmentation (#104) to generate a clean mask.
GrowMask (#100) and BlockifyMask (#108) stabilize and expand edges, and DrawMaskOnImage (#99) gives a quick sanity check. This mask lets Wan2.2 Animate focus on the performer while respecting the original background.

Reference Image

Drop in a single, well‑lit portrait or full‑body still. ImageResizeKJv2 (#64) matches it to your working resolution, and the output is stored for the animation stage.
For best identity retention, pick a reference image with a clear face and minimal occlusions.

Face Images

The pipeline builds a face‑aware crop to drive micro‑expressions. DWPreprocessor (#177) finds pose keypoints, FaceMaskFromPoseKeypoints (#120) isolates the face region, and ImageCropByMaskAndResize (#96) produces aligned face crops. A small preview exporter is included for quick QA (VHS_VideoCombine (#112)).

Sampling & Decode

The reference image is embedded via WanVideoClipVisionEncode (#70), prompts are encoded with CLIPTextEncode (#172, #182, #183), and everything is fused by WanVideoAnimateEmbeds (#62).
WanVideoSampler (#27) runs the core Wan2.2 Animate diffusion. You can work in “context window” mode for very long clips or use the original long‑gen path; the included note explains when to match the context window to the frame count for stability. The sampler’s output is decoded by WanVideoDecode (#28) and saved with optional audio passthrough (VHS_VideoCombine (#30)).

Result collage

ImageConcatMulti (#77, #66) and GetImageSizeAndCount (#42) assemble a side‑by‑side panel of reference, face, pose, and output. Use it to spot‑check identity and motion alignment before final export.

Upscale and Interpolate

UltimateSDUpscaleNoUpscale (#180) refines edges and textures with the provided UNet (UNETLoader (#181)) and VAE (VAELoader (#184)); positive/negative prompts can gently steer detail.
RIFEInterpolation (#188) optionally doubles motion smoothness, and VHS_VideoCombine (#189) writes the final Wan2.2 Animate clip.

Key nodes in Comfyui Wan2.2 Animate workflow

VHS_LoadVideo (#63)
- Role. Loads the driving video, outputs frames, extracts audio, and reports the frame count for downstream consistency.
- Tip. Keep the reported frame total aligned with the sampler’s generation length to prevent early cutoff or black frames.
Sam2Segmentation (#104) + PointsEditor (#107)
- Role. Interactive subject masking that helps Wan2.2 Animate focus on the performer and avoid background entanglement.
- Tip. A few well‑placed positive points plus a modest GrowMask tends to out‑stabilize complex backgrounds without haloing. See SAM 2 for video‑aware segmentation guidance. Paper
DWPreprocessor (#177) + FaceMaskFromPoseKeypoints (#120)
- Role. Derive robust face masks and aligned crops from detected keypoints to improve lip, eye, and jaw fidelity.
- Tip. If expressions look muted, verify the face mask covers the full jawline and cheeks; re‑run the crop after adjusting points. Repo
WanVideoModelLoader (#22) and WanVideoSetLoRAs (#48)
- Role. Load Wan2.2 Animate and apply optional LoRAs for relighting or I2V bias.
- Tip. Activate one LoRA at a time when diagnosing lighting or motion artifacts; stack sparingly to avoid over‑constraint. Models • LoRAs
WanVideoAnimateEmbeds (#62) and WanVideoSampler (#27)
- Role. Fuse image, face, pose, and text conditioning into video latents and sample the sequence with Wan2.2 Animate.
- Tip. For very long clips, switch to context‑window mode and keep its length synchronized with the intended frame count to preserve temporal coherence. Wrapper repo
UltimateSDUpscaleNoUpscale (#180)
- Role. Lightweight detail pass after decode with tiling support to keep memory steady.
- Tip. If you see tile seams, modestly increase overlap and keep prompt steering very soft to avoid off‑model textures. KJNodes
RIFEInterpolation (#188)
- Role. Smooths motion by inserting in‑between frames without re‑rendering the clip.
- Tip. Apply interpolation after upscaling so optical flow sees the final detail profile. Paper

Optional extras

For the cleanest identity, choose a sharp, front‑facing reference and keep accessories consistent with the driving video.
If background flicker appears, refine the SAM 2 mask and re‑run; masking is often the fastest fix for scene leakage.
Keep width and height aligned with your target platform and the input’s aspect ratio; square‑pixel, multiples of 16 work well in Wan2.2 Animate.
Audio from the driving video can be passed through at export; if you prefer silence, disable audio in the save node.
Start with one LoRA; if you add relight and I2V together, test each separately first to understand their influence.

Links you may find useful:

Wan2.2 Animate model and assets by Kijai: WanAnimate models, Wan 2.1 VAE, UMT5 encoder, Lightx2v
ComfyUI wrappers and nodes used: ComfyUI‑WanVideoWrapper, ComfyUI‑KJNodes

Acknowledgements

This workflow implements and builds upon the following works and resources. We gratefully acknowledge Wan2.2 and @ArtOfficialLabs for Wan2.2 Animate Demo for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.

Resources

Wan2.2/Wan2.2 Animate Demo
- Docs / Release Notes: Wan2.2 Animate Demo @ArtOfficialLabs

Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.

Want More ComfyUI Workflows?

LivePortrait | Animate Portraits | Vid2Vid

Updated 6/16/2025: ComfyUI version updated to v0.3.39 for improved stability and compatibility. Transfer facial expressions and movements from a driving video onto a source video

Portrait Master | Text to Portrait

Use the Portrait Master for greater control over portrait creations without relying on complex prompts.

Advanced Live Portrait | Parameter Control

Use customizable parameters to control every feature, from eye blinks to head movements, for natural results.

FLUX Kontext Preset | Scene Control

Master scene creation with curated one-click AI presets.

Outpainting | Expand Image

Easily extend images using outpainting node and ControlNet inpainting model.

AnimateDiff + ControlNet | Cartoon Style

Give your videos a playful twist by transforming them into lively cartoons.

InfiniteTalk | Lip-Synced Avatar Generator

Photo + Voice = Perfectly Synced Talking Avatar in Minutes

IPAdapter Plus (V2) | One-Image Style Transfer

Use IPAdapter Plus and ControlNet for precise style transfer with a single reference image.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.