ComfyUI>Workflows>Wan 2.2 VACE | Pose-Controlled Video Generator

Wan 2.2 VACE | Pose-Controlled Video Generator

Workflow Name: RunComfy/Wan-2-2-VACE

Workflow ID: 0000...1297

With Wan 2.2 VACE, you can animate a reference image using a pose or motion video, creating lifelike movement that mirrors real gestures and rhythm. Designed for motion transfer, dance generation, and creative animation, this tool preserves visual identity and detail. Its precise pose control and natural motion rendering help you achieve expressive, cinematic results efficiently. Whether for character animation or performance synthesis, it provides smooth, realistic output. Experience intuitive control and production-ready results in every render.

Wan 2.2 VACE pose-driven video generation for ComfyUI

This ComfyUI Wan 2.2 VACE workflow turns a single reference image into a motion-matched video that follows the pose, rhythm, and camera movement of a source clip. It uses Wan 2.2 VACE to preserve identity while translating complex body motion into smooth, realistic animation.

Designed for dance generation, motion transfer, and creative character animation, the workflow automates style prompting from the reference image, extracts motion signals from the source video, and runs a two-stage Wan 2.2 sampler that balances motion coherence and fine detail.

Key models in Comfyui Wan 2.2 VACE workflow

Wan 2.2 14B Text-to-Video models (high-noise and low-noise variants). Dual stages use a high-noise backbone for robust motion shaping followed by a low-noise backbone for detail refinement.
Wan 2.1 VAE (bf16). Decodes and encodes latent video frames for Wan 2.2 VACE.
Google UMT5-XXL Encoder. Provides high-capacity text features used by Wan 2.2 for conditioning. Model card
Microsoft Florence-2 (Flux Large). Generates a rich caption from the reference image to bootstrap and stylize the prompt. Repo
Depth Anything v2 (ViT-L). Produces per-frame depth maps from the motion source video to guide structure and movement. Repo

How to use Comfyui Wan 2.2 VACE workflow

The workflow has five grouped stages: Inputs, PROMPT, Models, SAMPLING, and Output. You provide one reference image and one short motion video. The graph then computes motion guidance, encodes VACE identity features, runs a two-pass Wan 2.2 sampler, and saves both the final animation and an optional side-by-side preview.

Inputs

Load a motion source clip in VHS_LoadVideo (#141). You can trim with simple controls and cap frames for memory. The frames are resized for consistency, then DepthAnythingV2Preprocessor (#135) computes a dense depth sequence that captures pose, layout, and camera movement. Load your identity image with LoadImage (#113); it is auto-resized and previewed so you can verify framing before sampling.

PROMPT

Florence2Run (#137) analyzes the reference image and returns a detailed caption. Style Prompt (#138) concatenates that caption with a short style phrase, then WanVideoTextEncode (#16) encodes the final positive and negative prompts using UMT5-XXL. You can freely edit the style phrase or replace the positive prompt entirely if you want stronger creative direction. This prompt embedding conditions both sampler stages so the generated video stays faithful to your reference.

Models

WanVideoVAELoader (#38) loads the Wan VAE used across encode/decode. Two WanVideoModelLoader nodes prepare Wan 2.2 14B models: one high-noise and one low-noise, each augmented with a VACE module selected in WanVideoExtraModelSelect (#99, #107). Optional refinement LoRA is attached through WanVideoLoraSelect (#56, #97), letting you nudge sharpness or style without changing the base models. The configuration is designed so you can swap VACE weights, LoRA, or the noise variant without touching the rest of the graph.

SAMPLING

WanVideoVACEEncode (#100) fuses three signals into VACE embeddings: the motion sequence (depth frames), your reference image, and the target video geometry. The first WanVideoSampler (#27) runs the high-noise model up to a split step to establish motion, perspective, and global style. The second WanVideoSampler (#90) resumes from that latent and finishes with the low-noise model to recover textures, edges, and small details while keeping motion locked to the source. A short CFG schedule and step split control how much each stage influences the result.

Output

WanVideoDecode (#28) converts the final latent back to frames. You get two saved videos: a clean render and a side-by-side concat that places the generated frames next to the reference for quick QA. A separate “Depth Map Preview” shows the inferred depth sequence so you can diagnose motion guidance at a glance. Frame rate and filename settings are available in the VHS_VideoCombine outputs (#139, #60, #144).

Key nodes in Comfyui Wan 2.2 VACE workflow

`WanVideoVACEEncode` (#100)

Creates the VACE identity-and-geometry embeddings used by both samplers. Supply your motion frames and the reference image; the node handles width, height, and frame count. If you change duration or aspect, keep this node in sync so the embeddings match your target video layout.

`WanVideoSampler` (#27)

First-stage sampler using the high-noise Wan 2.2 model. Tune steps, a short cfg schedule, and the end_step split to decide how much of the trajectory is allocated to motion shaping. Larger motion or camera changes benefit from a slightly later split.

`WanVideoSampler` (#90)

Second-stage sampler using the low-noise Wan 2.2 model. Set start_step to the same split value so it continues seamlessly from stage one. If you see texture oversharpening or drift, reduce the later cfg values or lower LoRA strength.

`DepthAnythingV2Preprocessor` (#135)

Extracts a stable depth sequence from the source video. Using depth as motion guidance helps Wan 2.2 VACE retain scene layout, hand pose, and occlusion. For fast iteration, you can resize input frames smaller; for final renders, feed higher-resolution frames for better structural fidelity.

`WanVideoTextEncode` (#16)

Encodes the positive and negative prompts with UMT5-XXL. The prompt is auto-built from Florence2Run, but you can override it for art-direction. Keep prompts concise; with VACE identity guidance, fewer keywords often yield cleaner, less constrained motion transfer.

Optional extras

Choose motion clips with clear subject separation and consistent lighting for the most stable Wan 2.2 VACE transfers.
Use the side-by-side output to verify face alignment and outfit continuity before rendering a final pass.
If motion looks too stiff, move the split a bit earlier so the low-noise stage has more room to refine.
If identity is drifting, increase LoRA influence slightly or simplify the prompt.
Depth preview is your friend: if depth is noisy, try a different source clip or adjust input resizing to reduce artifacts.

Acknowledgements

This workflow implements and builds upon the following works and resources. We gratefully acknowledge the ComfyUI community creators of Wan 2.2 VACE Source for the workflow, for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.

Resources

Wan 2.2 VACE Source/Wan 2.2 VACE Source
- Docs / Release Notes: Wan 2.2 VACE @ComfyUI

Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.

Want More ComfyUI Workflows?

Wan 2.2 + Lightx2v V2 | Ultra Fast T2V

Dual Light LoRA setup reduces steps to 8, 4x faster.

Wan 2.2 FLF2V | First-Last Frame Video Generation

Generate smooth videos from a start and end frame using Wan 2.2 FLF2V.

Wan 2.2 Lightning T2V I2V | 4-Step Ultra Fast

Wan 2.2 now 20x faster! T2V + I2V in 4 steps.

VACE 14B: All-in-One Video Creation & Editing

Create, edit and transform videos with the powerful VACE Wan2.1 14B.

Wan2.2 Animate | Photo to Realistic Motion Video

Turn images into lifelike, moving characters with natural body and face motion.

Insert Anything | Reference-Based Image Editing

Insert any subject into images with mask or text guidance.

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

Qwen Edit 2509 Light Restoration | Photo Relight Tool

Fix bad lighting fast for perfect, clean, balanced photos every time.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.

Wan 2.2 VACE | Pose-Controlled Video Generator

Wan 2.2 VACE pose-driven video generation for ComfyUI

Key models in Comfyui Wan 2.2 VACE workflow

How to use Comfyui Wan 2.2 VACE workflow

Inputs

PROMPT

Models

SAMPLING

Output

Key nodes in Comfyui Wan 2.2 VACE workflow

WanVideoVACEEncode (#100)

WanVideoSampler (#27)

WanVideoSampler (#90)

DepthAnythingV2Preprocessor (#135)

WanVideoTextEncode (#16)

Optional extras

Acknowledgements

Resources

Want More ComfyUI Workflows?

Wan 2.2 + Lightx2v V2 | Ultra Fast T2V

Wan 2.2 FLF2V | First-Last Frame Video Generation

Wan 2.2 Lightning T2V I2V | 4-Step Ultra Fast

VACE 14B: All-in-One Video Creation & Editing

Wan2.2 Animate | Photo to Realistic Motion Video

Insert Anything | Reference-Based Image Editing

PuLID Flux II | Consistent Character Generation

Qwen Edit 2509 Light Restoration | Photo Relight Tool

`WanVideoVACEEncode` (#100)

`WanVideoSampler` (#27)

`WanVideoSampler` (#90)

`DepthAnythingV2Preprocessor` (#135)

`WanVideoTextEncode` (#16)