This workflow turns one reference image into a short video where the same face and style persist across frames. Powered by the Wan 2.1 family and a purpose‑built Stand In LoRA, it is designed for storytellers, animators, and avatar creators who need stable identity with minimal setup. The Wan2.1 Stand In pipeline handles background cleanup, cropping, masking, and embedding so you can focus on your prompt and motion.
Use the Wan2.1 Stand In workflow when you want reliable identity continuity from a single photo, fast iteration, and export‑ready MP4s plus an optional side‑by‑side comparison output.
At a glance: load a clean, front‑facing reference image, the workflow prepares a face‑focused mask and composite, encodes it to a latent, merges that identity into Wan 2.1 image embeds, then samples video frames and exports MP4. Two outputs are saved: the main render and a side‑by‑side comparison.
Start with a well‑lit, forward‑facing image on a plain background. The pipeline loads your image in LoadImage
(#58), standardizes size with ImageResizeKJv2
(#142), and creates a face‑centric mask using MediaPipe-FaceMeshPreprocessor
(#144) and BinaryPreprocessor
(#151). Background is removed in TransparentBGSession+
(#127) and ImageRemoveBackground+
(#128), then the subject is composited over a clean canvas with ImageCompositeMasked
(#108) to minimize color bleeding. Finally, ImagePadKJ
(#129) and ImageResizeKJv2
(#68) align the aspect for generation; the prepared frame is encoded to a latent via WanVideoEncode
(#104).
If you want motion control from an existing clip, load it with VHS_LoadVideo
(#161) and optionally a secondary guide or alpha video with VHS_LoadVideo
(#168). The frames pass through DWPreprocessor
(#163) for pose cues and ImageResizeKJv2
(#169) for shape matching; ImageToMask
(#171) and ImageCompositeMasked
(#174) let you blend control imagery precisely. WanVideoVACEEncode
(#160) turns these into VACE embeddings. This path is optional; leave it untouched when you want text‑driven motion from Wan 2.1 alone.
WanVideoModelLoader
(#22) loads the Wan 2.1 14B base plus the Stand In LoRA so identity is baked in from the start. VRAM‑friendly speed features are available through WanVideoBlockSwap
(#39) and applied with WanVideoSetBlockSwap
(#70). You can attach an extra adapter such as LightX2V via WanVideoSetLoRAs
(#79). Prompts are encoded with WanVideoTextEncodeCached
(#159), using UMT5‑XXL under the hood for multilingual control. Keep prompts concise and descriptive; emphasize the subject’s clothing, angle, and lighting to complement the Stand In identity.
WanVideoEmptyEmbeds
(#177) establishes the target shape for image embeddings, and WanVideoAddStandInLatent
(#102) injects your encoded reference latent to carry identity through time. The combined image and text embeddings feed into WanVideoSampler
(#27), which generates a latent video sequence using the configured scheduler and steps. After sampling, frames are decoded with WanVideoDecode
(#28) and written to an MP4 in VHS_VideoCombine
(#180).
For instant QA, ImageConcatMulti
(#122) stacks the generated frames beside the resized reference so you can judge likeness frame by frame. VHS_VideoCombine
(#74) saves that as a separate “Compare” MP4. The Wan2.1 Stand In workflow therefore produces a clean final video plus a side‑by‑side check without extra effort.
WanVideoModelLoader
(#22). Loads Wan 2.1 14B and applies the Stand In LoRA at model initialization. Keep the Stand In adapter connected here rather than later in the graph so identity is enforced throughout the denoising path. Pair with WanVideoVAELoader
(#38) for the matching Wan‑VAE.WanVideoAddStandInLatent
(#102). Fuses your encoded reference image latent into the image embeddings. If identity drifts, increase its influence; if motion seems overly constrained, reduce it slightly.WanVideoSampler
(#27). The main generator. Tuning steps, scheduler choice, and guidance strategy here has the largest impact on detail, motion richness, and temporal stability. When pushing resolution or length, consider adjusting sampler settings before changing anything upstream.WanVideoSetBlockSwap
(#70) with WanVideoBlockSwap
(#39). Trades GPU memory for speed by swapping attention blocks between devices. If you see out‑of‑memory errors, increase offloading; if you have headroom, reduce offloading for faster iteration.ImageRemoveBackground+
(#128) and ImageCompositeMasked
(#108). These ensure the subject is cleanly isolated and placed on a neutral canvas, which reduces color contamination and improves the Stand In identity lock across frames.VHS_VideoCombine
(#180). Controls encoding, frame rate, and file naming for the main MP4 output. Use it to set your preferred FPS and quality target for delivery.Resources
This workflow implements and builds upon ArtOfficial Labs works and resources. We gratefully acknowledge ArtOfficial Labs and Wan 2.1 authors for Wan2.1 Demo for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.