Wan2.2 Animate: full‑motion reference‑to‑video animation in ComfyUI
Wan2.2 Animate turns a single reference image into a lifelike performance that follows a driving video’s full‑body motion and facial expressions. This ComfyUI Wan2.2 Animate workflow fuses pose transfer, face mocap, background control, and LoRA add‑ons so characters move naturally while identity stays intact.
Designed for avatars, performance re‑creations, music videos, and story beats, Wan2.2 Animate produces clean, temporally stable clips with optional audio passthrough, quality upscaling, and interpolation. It ships as a guided graph with sensible defaults, so you can focus on creative choices rather than plumbing.
Key models in Comfyui Wan2.2 Animate workflow
- Wan 2.2 Animate 14B (I2V) fp8 scaled. The core video model that interprets pose, face, image, and text guidance to synthesize the motion track with identity preservation. Model set
- Wan 2.1 VAE bf16. The matching VAE used to encode/decode latents for the Wan family, ensuring color fidelity and sharpness. VAE
- UMT5‑XXL text encoder. Provides robust multilingual text conditioning for positive and negative prompts. Encoder
- CLIP ViT‑H/14 vision encoder. Extracts visual embeddings from the reference image to preserve identity and style. Paper
- Optional Wan LoRAs. Lightweight adapters for lighting and I2V behavior control, such as Lightx2v I2V 14B and Relight. Lightx2v • Relight
- Segment Anything 2 (SAM 2). High‑quality image/video segmentation used to isolate the subject or background. Paper
- DWPose. Accurate 2D pose estimation used for face/pose‑aware crops and masks. Repo
- RIFE. Fast video frame interpolation to boost playback smoothness. Paper
How to use Comfyui Wan2.2 Animate workflow
Overall flow. The graph ingests a driving video and a single reference image, prepares a clean subject/background and a face‑aware crop, then feeds pose, face, image, and text embeds into Wan2.2 Animate for sampling and decode. A final stage upscales details and optionally interpolates frames before export.
- Models
- This group loads the Wan2.2 Animate base, matching VAE, text/vision encoders, and any selected LoRAs. The
WanVideoModelLoader(#22) andWanVideoSetLoRAs(#48) wire the model and adapters, whileWanVideoVAELoader(#38) andCLIPLoader(#175) provide VAE and text backbones. - If you plan to adjust LoRAs (e.g., relight or I2V style), keep only one or two active at a time to avoid conflicts, then preview with the provided collage nodes.
- This group loads the Wan2.2 Animate base, matching VAE, text/vision encoders, and any selected LoRAs. The
Size
- Set your target
widthandheightin the size group and confirm theframe_countmatches the frames you plan to load from the driving video.VHS_LoadVideo(#63) reports the count; keep the sampler’snum_framesconsistent to avoid tail truncation. - The
PixelPerfectResolution(#152) helper reads the driving clip to suggest stable generation sizing.
Background Masking
- Load your driving video in
VHS_LoadVideo(#63); audio is extracted automatically for later passthrough. UsePointsEditor(#107) to place a few positive points on the subject and runSam2Segmentation(#104) to generate a clean mask. GrowMask(#100) andBlockifyMask(#108) stabilize and expand edges, andDrawMaskOnImage(#99) gives a quick sanity check. This mask lets Wan2.2 Animate focus on the performer while respecting the original background.
Reference Image
- Drop in a single, well‑lit portrait or full‑body still.
ImageResizeKJv2(#64) matches it to your working resolution, and the output is stored for the animation stage. - For best identity retention, pick a reference image with a clear face and minimal occlusions.
Face Images
- The pipeline builds a face‑aware crop to drive micro‑expressions.
DWPreprocessor(#177) finds pose keypoints,FaceMaskFromPoseKeypoints(#120) isolates the face region, andImageCropByMaskAndResize(#96) produces aligned face crops. A small preview exporter is included for quick QA (VHS_VideoCombine(#112)).
Sampling & Decode
- The reference image is embedded via
WanVideoClipVisionEncode(#70), prompts are encoded withCLIPTextEncode(#172, #182, #183), and everything is fused byWanVideoAnimateEmbeds(#62). WanVideoSampler(#27) runs the core Wan2.2 Animate diffusion. You can work in “context window” mode for very long clips or use the original long‑gen path; the included note explains when to match the context window to the frame count for stability. The sampler’s output is decoded byWanVideoDecode(#28) and saved with optional audio passthrough (VHS_VideoCombine(#30)).
Result collage
ImageConcatMulti(#77, #66) andGetImageSizeAndCount(#42) assemble a side‑by‑side panel of reference, face, pose, and output. Use it to spot‑check identity and motion alignment before final export.
Upscale and Interpolate
UltimateSDUpscaleNoUpscale(#180) refines edges and textures with the provided UNet (UNETLoader(#181)) and VAE (VAELoader(#184)); positive/negative prompts can gently steer detail.RIFEInterpolation(#188) optionally doubles motion smoothness, andVHS_VideoCombine(#189) writes the final Wan2.2 Animate clip.
Key nodes in Comfyui Wan2.2 Animate workflow
VHS_LoadVideo(#63)- Role. Loads the driving video, outputs frames, extracts audio, and reports the frame count for downstream consistency.
- Tip. Keep the reported frame total aligned with the sampler’s generation length to prevent early cutoff or black frames.
Sam2Segmentation(#104) +PointsEditor(#107)- Role. Interactive subject masking that helps Wan2.2 Animate focus on the performer and avoid background entanglement.
- Tip. A few well‑placed positive points plus a modest
GrowMasktends to out‑stabilize complex backgrounds without haloing. See SAM 2 for video‑aware segmentation guidance. Paper
DWPreprocessor(#177) +FaceMaskFromPoseKeypoints(#120)- Role. Derive robust face masks and aligned crops from detected keypoints to improve lip, eye, and jaw fidelity.
- Tip. If expressions look muted, verify the face mask covers the full jawline and cheeks; re‑run the crop after adjusting points. Repo
WanVideoModelLoader(#22) andWanVideoSetLoRAs(#48)WanVideoAnimateEmbeds(#62) andWanVideoSampler(#27)- Role. Fuse image, face, pose, and text conditioning into video latents and sample the sequence with Wan2.2 Animate.
- Tip. For very long clips, switch to context‑window mode and keep its length synchronized with the intended frame count to preserve temporal coherence. Wrapper repo
UltimateSDUpscaleNoUpscale(#180)- Role. Lightweight detail pass after decode with tiling support to keep memory steady.
- Tip. If you see tile seams, modestly increase overlap and keep prompt steering very soft to avoid off‑model textures. KJNodes
RIFEInterpolation(#188)- Role. Smooths motion by inserting in‑between frames without re‑rendering the clip.
- Tip. Apply interpolation after upscaling so optical flow sees the final detail profile. Paper
Optional extras
- For the cleanest identity, choose a sharp, front‑facing reference and keep accessories consistent with the driving video.
- If background flicker appears, refine the SAM 2 mask and re‑run; masking is often the fastest fix for scene leakage.
- Keep width and height aligned with your target platform and the input’s aspect ratio; square‑pixel, multiples of 16 work well in Wan2.2 Animate.
- Audio from the driving video can be passed through at export; if you prefer silence, disable audio in the save node.
- Start with one LoRA; if you add relight and I2V together, test each separately first to understand their influence.
Links you may find useful:
- Wan2.2 Animate model and assets by Kijai: WanAnimate models, Wan 2.1 VAE, UMT5 encoder, Lightx2v
- ComfyUI wrappers and nodes used: ComfyUI‑WanVideoWrapper, ComfyUI‑KJNodes
Acknowledgements
This workflow implements and builds upon the following works and resources. We gratefully acknowledge Wan2.2 and @ArtOfficialLabs for Wan2.2 Animate Demo for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Resources
- Wan2.2/Wan2.2 Animate Demo
- Docs / Release Notes: Wan2.2 Animate Demo @ArtOfficialLabs
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.


