This ComfyUI workflow turns a single reference image into a coherent video, driven by the motion of a separate pose source. It is built around SteadyDancer’s image-to-video paradigm so the very first frame preserves the identity and appearance of your input image while the rest of the sequence follows the target motion. The graph reconciles pose and appearance through SteadyDancer-specific embeds and a pose pipeline, producing smooth, realistic full‑body movement with strong temporal coherence.
SteadyDancer is ideal for human animation, dance generation, and bringing characters or portraits to life. Provide one still image plus a motion clip, and the ComfyUI pipeline handles pose extraction, embedding, sampling, and decoding to deliver a ready‑to‑share video.
The workflow has two independent inputs that meet at sampling: a reference image for identity and a driving video for motion. Models load once up front, pose is extracted from the driving clip, and SteadyDancer embeds blend pose and appearance before generation and decoding.
This group loads the core weights used throughout the graph. WanVideoModelLoader (#22) selects the Wan 2.1 I2V SteadyDancer checkpoint and handles attention and precision settings. WanVideoVAELoader (#38) provides the video VAE, and CLIPVisionLoader (#59) prepares the CLIP ViT‑H vision backbone. A LoRA selection node and BlockSwap options are present for advanced users who want to change memory behavior or attach add‑on weights.
Import the motion source using VHS_LoadVideo (#75). The node reads frames and audio, letting you set a target frame rate or cap the number of frames. The clip can be any human motion such as a dance or sports move. The video stream then flows to aspect‑ratio scaling and pose extraction.
A simple constant controls how many frames are loaded from the driving video. This limits both pose extraction and the length of the generated SteadyDancer output. Increase it for longer sequences, or reduce it to iterate faster.
LayerUtility: ImageScaleByAspectRatio V2 (#146) scales frames while preserving aspect ratio so they fit the model’s stride and memory budget. Set a long‑side limit appropriate for your GPU and the desired detail level. The scaled frames are used by the downstream detection nodes and as a reference for output size.
Person detection and pose estimation run on the scaled frames. PoseAndFaceDetection (#89) uses YOLOv10 and ViTPose‑H to find people and keypoints robustly. DrawViTPose (#88) renders a clean stick‑figure representation of the motion, and ImageResizeKJv2 (#77) sizes the resulting pose images to match the generation canvas. WanVideoEncode (#72) converts the pose images into latents so SteadyDancer can modulate motion without fighting the appearance signal.
Load the identity image that you want SteadyDancer to animate. The image should clearly show the subject you intend to move. Use a pose and camera angle that broadly matches the driving video for the most faithful transfer. The frame is forwarded to the reference image group for embedding.
The still image is resized with ImageResizeKJv2 (#68) and registered as the start frame via Set_IMAGE (#96). WanVideoClipVisionEncode (#65) extracts CLIP ViT‑H embeddings that preserve identity, clothing, and coarse layout. WanVideoImageToVideoEncode (#63) packs width, height, and frame count with the start frame to prepare SteadyDancer’s I2V conditioning.
This is where appearance and motion meet to generate video. WanVideoAddSteadyDancerEmbeds (#71) receives image conditioning from WanVideoImageToVideoEncode and augments it with pose latents plus a CLIP‑vision reference, enabling SteadyDancer’s condition reconciliation. Context windows and overlap are set in WanVideoContextOptions (#87) for temporal consistency. Optionally, WanVideoTextEncodeCached (#92) adds umT5 text guidance for style nudges. WanVideoSamplerSettings (#119) and WanVideoSamplerFromSettings (#129) run the actual denoising steps on the Wan 2.1 model, after which WanVideoDecode (#28) converts latents back to RGB frames. Final videos are saved with VHS_VideoCombine (#141, #83).
WanVideoAddSteadyDancerEmbeds (#71)This node is the SteadyDancer heart of the graph. It fuses the image conditioning with pose latents and CLIP‑vision cues so the first frame locks identity while motion unfolds naturally. Adjust pose_strength_spatial to control how tightly limbs follow the detected skeleton and pose_strength_temporal to regulate motion smoothness over time. Use start_percent and end_percent to limit where pose control applies within the sequence for more natural intros and outros.
PoseAndFaceDetection (#89)Runs YOLOv10 detection and ViTPose‑H keypoint estimation on the driving video. If poses miss small limbs or faces, increase input resolution upstream or choose footage with fewer occlusions and cleaner lighting. When multiple people are present, keep the target subject largest in frame so the detector and pose head remain stable.
VHS_LoadVideo (#75)Controls what portion of the motion source you use. Increase the frame cap for longer outputs or lower it to prototype rapidly. The force_rate input aligns pose spacing with the generation rate and can help reduce stutter when the original clip’s FPS is unusual.
LayerUtility: ImageScaleByAspectRatio V2 (#146)Keeps frames within a chosen long‑side limit while maintaining aspect ratio and bucketing to a divisible size. Match the scale here to the generation canvas so SteadyDancer does not need to upsample or crop aggressively. If you see soft results or edge artifacts, bring the long side closer to the model’s native training scale for a cleaner decode.
WanVideoSamplerSettings (#119)Defines the denoising plan for the Wan 2.1 sampler. The scheduler and steps set overall quality versus speed, while cfg balances adherence to the image plus prompt against diversity. seed locks reproducibility, and denoise_strength can be lowered when you want to hew even closer to the reference image’s appearance.
WanVideoModelLoader (#22)Loads the Wan 2.1 I2V SteadyDancer checkpoint and handles precision, attention implementation, and device placement. Leave these as configured for stability. Advanced users can attach an I2V LoRA to alter motion behavior or lighten computational cost when experimenting.
WanVideoAddSteadyDancerEmbeds or raise the video FPS to densify poses.This SteadyDancer workflow gives you a practical, end‑to‑end path from one still image to a faithful, pose‑driven video with identity preserved from the very first frame.
This workflow implements and builds upon the following works and resources. We gratefully acknowledge MCG-NJU for SteadyDancer for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.