This ComfyUI Wan 2.2 VACE workflow turns a single reference image into a motion-matched video that follows the pose, rhythm, and camera movement of a source clip. It uses Wan 2.2 VACE to preserve identity while translating complex body motion into smooth, realistic animation.
Designed for dance generation, motion transfer, and creative character animation, the workflow automates style prompting from the reference image, extracts motion signals from the source video, and runs a two-stage Wan 2.2 sampler that balances motion coherence and fine detail.
The workflow has five grouped stages: Inputs, PROMPT, Models, SAMPLING, and Output. You provide one reference image and one short motion video. The graph then computes motion guidance, encodes VACE identity features, runs a two-pass Wan 2.2 sampler, and saves both the final animation and an optional side-by-side preview.
Load a motion source clip in VHS_LoadVideo
(#141). You can trim with simple controls and cap frames for memory. The frames are resized for consistency, then DepthAnythingV2Preprocessor
(#135) computes a dense depth sequence that captures pose, layout, and camera movement. Load your identity image with LoadImage
(#113); it is auto-resized and previewed so you can verify framing before sampling.
Florence2Run
(#137) analyzes the reference image and returns a detailed caption. Style Prompt
(#138) concatenates that caption with a short style phrase, then WanVideoTextEncode
(#16) encodes the final positive and negative prompts using UMT5-XXL. You can freely edit the style phrase or replace the positive prompt entirely if you want stronger creative direction. This prompt embedding conditions both sampler stages so the generated video stays faithful to your reference.
WanVideoVAELoader
(#38) loads the Wan VAE used across encode/decode. Two WanVideoModelLoader
nodes prepare Wan 2.2 14B models: one high-noise and one low-noise, each augmented with a VACE module selected in WanVideoExtraModelSelect
(#99, #107). Optional refinement LoRA is attached through WanVideoLoraSelect
(#56, #97), letting you nudge sharpness or style without changing the base models. The configuration is designed so you can swap VACE weights, LoRA, or the noise variant without touching the rest of the graph.
WanVideoVACEEncode
(#100) fuses three signals into VACE embeddings: the motion sequence (depth frames), your reference image, and the target video geometry. The first WanVideoSampler
(#27) runs the high-noise model up to a split step to establish motion, perspective, and global style. The second WanVideoSampler
(#90) resumes from that latent and finishes with the low-noise model to recover textures, edges, and small details while keeping motion locked to the source. A short CFG schedule and step split control how much each stage influences the result.
WanVideoDecode
(#28) converts the final latent back to frames. You get two saved videos: a clean render and a side-by-side concat that places the generated frames next to the reference for quick QA. A separate “Depth Map Preview” shows the inferred depth sequence so you can diagnose motion guidance at a glance. Frame rate and filename settings are available in the VHS_VideoCombine
outputs (#139, #60, #144).
WanVideoVACEEncode
(#100)Creates the VACE identity-and-geometry embeddings used by both samplers. Supply your motion frames and the reference image; the node handles width, height, and frame count. If you change duration or aspect, keep this node in sync so the embeddings match your target video layout.
WanVideoSampler
(#27)First-stage sampler using the high-noise Wan 2.2 model. Tune steps
, a short cfg
schedule, and the end_step
split to decide how much of the trajectory is allocated to motion shaping. Larger motion or camera changes benefit from a slightly later split.
WanVideoSampler
(#90)Second-stage sampler using the low-noise Wan 2.2 model. Set start_step
to the same split value so it continues seamlessly from stage one. If you see texture oversharpening or drift, reduce the later cfg
values or lower LoRA strength.
DepthAnythingV2Preprocessor
(#135)Extracts a stable depth sequence from the source video. Using depth as motion guidance helps Wan 2.2 VACE retain scene layout, hand pose, and occlusion. For fast iteration, you can resize input frames smaller; for final renders, feed higher-resolution frames for better structural fidelity.
WanVideoTextEncode
(#16)Encodes the positive and negative prompts with UMT5-XXL. The prompt is auto-built from Florence2Run
, but you can override it for art-direction. Keep prompts concise; with VACE identity guidance, fewer keywords often yield cleaner, less constrained motion transfer.
This workflow implements and builds upon the following works and resources. We gratefully acknowledge the ComfyUI community creators of Wan 2.2 VACE Source for the workflow, for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.