Generate ENTIRE AI WORLDS (Vace Wan 2.1) is a production‑ready ComfyUI workflow by Mickmumpitz for transforming live‑action footage into new environments while keeping the original camera motion. It swaps backgrounds, preserves perspective and scale, and composites a masked actor into fully regenerated worlds driven by text and reference imagery.
Built on the Wan 2.1 VACE stack, this workflow is ideal for filmmakers, VFX artists, and creators who need fast previz or polished shots. You can direct the scene with prompts, start from an optional reference image, and choose between a high‑speed FP8 pipeline or a low‑VRAM GGUF pipeline. The result is seamless worldbuilding that lets you truly Generate ENTIRE AI WORLDS (Vace Wan 2.1) from everyday plates.
This workflow follows a two‑pass VACE strategy: first, it encodes scene motion from control images to lock camera movement; second, it encodes the actor insert and blends it into the regenerated environment. You can run the FP8 path for maximum speed or the GGUF path for low VRAM. The sections below map to the on‑graph groups so you can operate the entire Generate ENTIRE AI WORLDS (Vace Wan 2.1) pipeline with confidence.
The input area lets you pick the working resolution and basic clip controls. Use the resolution switch to choose a preset (720p, 576p, or 480p), which feeds Set_width
(#370) and Set_height
(#369) so every stage stays in sync. You can cap the number of frames to keep turnarounds fast and set a small skip if you want to offset the in‑point. For stability and memory, keep sequences within the recommended range; the graph labels call out that 81 frames is a sensible ceiling for most GPUs. These choices apply globally to control images, VACE encodes, and final renders.
Note: The input video can also be generated through another workflow, MASK_AND_TRACK. You can download its workflow file here: workflow.json. After downloading, drag the file into a new workflow tab and run it to obtain the input video.
A background plate and an optional reference image guide the visual style. Load a background still, then the graph resizes it to match your working size. If you want a style anchor instead of a hard backplate, enable the reference_image
through the selector; this image guides color, composition, and tone without dictating geometry. The reference route is helpful when you want the model to Generate ENTIRE AI WORLDS (Vace Wan 2.1) that echo a specific look, while the text prompt handles the rest. Switch it off when you prefer text‑only control.
Use this section to decide how generation begins. With a ready actor still, Image Remove Background Rembg (mtb)
(#1433) pulls a clean mask and ImageCompositeMasked
(#1441) places the actor on your chosen background to form a start frame. The Start Frame
switch (ImpactSwitch
, #1760) offers three modes: composite actor plus background, background only, or no start frame. Start frames help anchor identity and layout; background‑only lets the character “enter” over time; no start frame asks the model to establish both subject and world from text and reference. A live preview block shows what that start looks like before you commit downstream.
Control images lock the camera’s motion so perspective and parallax feel real. Feed a camera‑track video into the group; the graph can derive OpenPose and Canny layers, then blend them to create a strong structure signal. The Control Image Nodes
switch (ImpactSwitch
, #1032) lets you pick Track only, Track+Pose, Canny+Pose, or an externally prepared control video. Review the stack with the preview combine to ensure silhouettes and edges read clearly. For long sequences, you can save and later re‑load this control video to avoid recomputing; that’s especially useful when you iterate prompts or masks while continuing to Generate ENTIRE AI WORLDS (Vace Wan 2.1).
If you have already exported a “control images” video, drop it here to bypass preprocessing. Select the corresponding option in the control image switch so the rest of the pipeline uses your cached structure. This keeps camera tracking consistent across runs and dramatically reduces iteration time on long takes.
The FP8 branch loads the full Wan 2.1 model stack. WanVideoModelLoader
(#4) brings in the T2V 14B backbone and the VACE module, plus an optional LightX LoRA for fast, coherent sampling. WanVideoVAELoader
(#26) supplies the VAE, and WanVideoBlockSwap
(#5) exposes a VRAM‑saving strategy by swapping blocks to device memory as needed. This branch is the fastest way to Generate ENTIRE AI WORLDS (Vace Wan 2.1) when you have the VRAM headroom.
Prompts are encoded by WanVideoTextEncodeSingle
for positive and negative text, then refined through WanVideoApplyNAG
to keep phrasing consistent. The first pass, WanVideo VACE Encode (CN‑CameraTrack)
(#948), reads the control images to produce motion‑aware embeddings. The second pass, WanVideo VACE Encode (InsertPerson)
(#1425), injects the actor using a clean alpha and a mask that you can gently grow or shrink to avoid halos. WanVideoSampler
(#2) then renders the sequence, WanVideoDecode
(#1) turns latents into frames, and a simple switch chooses between the original frame rate or a FILM‑interpolated stream before the final video combine.
The GGUF branch is designed for low‑VRAM workflows. UnetLoaderGGUF
(#1677) loads a quantized Wan 2.1 VACE UNet, CLIPLoader
(#1680) provides the text encoder, and a LoRA can be applied with LoraLoader
(#2420). A standard ComfyUI VAELoader
(#1676) handles decode. This route trades speed for footprint while preserving the same two‑pass VACE logic so you can still Generate ENTIRE AI WORLDS (Vace Wan 2.1) on modest hardware.
In the quantized path, WanVaceToVideo
(#1724) turns VACE embeddings, text conditioning, and your reference into a guided latent. WanVideoNAG
and WanVideoEnhanceAVideoKJ
help maintain identity and local detail, after which KSampler
(#1726) generates the final latent sequence. VAEDecode
(#1742) produces frames, an optional FILM step adds temporal smoothness, and the video combine writes the result to disk. Use this path when VRAM is tight or when you need long, steady shots.
There are two prompt panels. The FP8 side uses the Wan T5 text encoder, while the GGUF side uses a CLIP conditioning path; both receive positive and negative text. Keep positive prompts cinematic and specific to the world you want, and reserve negative prompts for compression artifacts, over‑saturation, and unwanted foreground clutter. You can mix prompts with a soft reference image to steer color and lighting while still letting the model Generate ENTIRE AI WORLDS (Vace Wan 2.1) that match your intent.
WanVideo VACE Encode (CN-CameraTrack)
(#948)WanVideo VACE Encode (InsertPerson)
(#1425)DilateErodeMask
, #2391) to pull the matte in slightly. This pass ties the insert to scene motion so scale and parallax remain natural.WanVaceToVideo
(#1724 and #1729)WanVideoSampler
(#2)KSampler
(#1726)Enhance A Video
block to regain micro‑texture without drifting motion.FILM VFI
(#2019 and #1757)DilateErodeMask
in the insert path until halos disappear.With these steps, you can confidently run the workflow end‑to‑end and Generate ENTIRE AI WORLDS (Vace Wan 2.1) that hold up under real camera motion.
This workflow implements and builds upon the following works and resources. We gratefully acknowledge the creators of Workflow Tutorial for the instructional workflow, and thank them for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.