This workflow delivers end‑to‑end Video Character Replacement (MoCha): swap a performer in a real video with a new character while preserving motion, lighting, camera perspective, and scene continuity. Built around the Wan 2.1 MoCha 14B preview, it aligns a reference identity to the source performance, then synthesizes a coherent, edited clip and an optional side‑by‑side comparison. It is designed for filmmakers, VFX artists, and AI creators who need precise, high‑quality character swaps with minimal manual cleanup.
The pipeline combines robust first‑frame masking with Segment Anything 2 (SAM 2), MoCha’s motion‑aware image embeddings, WanVideo sampling/decoding, and an optional portrait assist that improves face fidelity. You provide a source video and one or two reference images; the workflow produces a finished replacement video plus an A/B compare, making iterative evaluation of Video Character Replacement (MoCha) fast and practical.
Wan 2.1 MoCha 14B preview. Core video generator for character replacement; drives temporally coherent synthesis from MoCha image embeddings and text prompts. Model weights distributed in the WanVideo Comfy format by Kijai, including fp8 scaled variants for efficiency. Hugging Face: Kijai/WanVideo_comfy, Kijai/WanVideo_comfy_fp8_scaled
MoCha (Orange‑3DV‑Team). Identity/motion conditioning method and reference implementation that inspired the embedding stage used here; helpful for understanding reference selection and pose alignment for Video Character Replacement (MoCha). GitHub, Hugging Face
Segment Anything 2 (SAM 2). High‑quality, point‑guided segmentation to isolate the actor in the first frame; clean masks are crucial for stable, artifact‑free swaps. GitHub: facebookresearch/segment-anything-2
Qwen‑Image‑Edit 2509 + Lightning LoRA. Optional single‑image assist that generates a clean, close‑up portrait to use as a second reference, improving facial identity preservation in difficult shots. Hugging Face: Comfy‑Org/Qwen‑Image‑Edit_ComfyUI, lightx2v/Qwen‑Image‑Lightning
Wan 2.1 VAE. Video VAE used by the Wan sampler/decoder stages for efficient latent processing. Hugging Face: Kijai/WanVideo_comfy
Overall logic
Input Video
First Frame Mask
ref1
ref2 (Optional)
Step1 - Load models
Step 2 - Upload image for editing
Step 4 - Prompt
Scene 2 - Sampling
Mocha
MochaEmbeds stage encodes the source video, first‑frame mask, and your reference image(s) into MoCha image embeddings. Embeddings capture identity, texture, and local appearance cues while respecting the original motion path. If ref2 exists, it is used to strengthen facial detail; otherwise, ref1 alone carries the identity.Wan Model
Wan Sampling
MochaEmbeds (#302). Encodes the source clip, first‑frame mask, and ref images into MoCha image embeddings that steer identity and appearance. Favor a ref1 pose that matches the first frame, and include ref2 for a clean face if you see drift. If edges shimmer, grow the mask slightly before embedding to avoid background leakage.
Sam2Segmentation (#326). Converts your positive/negative clicks into a first‑frame mask. Prioritize clean edges around hair and shoulders; add a few negative points to exclude nearby props. Expanding the mask a small amount after segmentation helps stability when the actor moves.
WanVideoSampler (#314). Drives the heavy lifting of Video Character Replacement (MoCha) by denoising latents into frames. More steps improve detail and temporal stability; fewer steps speed iteration. Keep the scheduler consistent across runs when you’re comparing changes to references or masks.
WanVideoSetBlockSwap (#344). When VRAM is tight, enable deeper block swapping to fit the Wan 2.1 MoCha 14B path on smaller GPUs. Expect some speed loss; in return you can keep resolution and sequence length.
VHS_VideoCombine (#355). Writes the final MP4 and embeds workflow metadata. Use the same frame rate as the source (already wired through) and yuv420p output for broad player compatibility.
Tips for clean swaps
Useful references
This workflow implements and builds upon the following works and resources. We gratefully acknowledge the Benji’s AI Playground of “Video Character Replacement (MoCha)” for Video Character Replacement (MoCha) for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.