This workflow applies Wan 2.1 Ditto to restyle any input video while preserving scene structure and motion. It is designed for editors and creators who want cinematic, artistic, or experimental looks with strong temporal consistency. You load a clip, describe the target look, and Wan 2.1 Ditto produces a clean stylized render plus an optional side‑by‑side comparison for quick review.
The graph pairs the Wan 2.1 text‑to‑video backbone with Ditto’s style transfer at the model level, so changes happen coherently across frames rather than as frame‑by‑frame filters. Common use cases include anime conversions, pixel art, claymation, watercolor, steampunk, or sim‑to‑real edits. If you already generate content with Wan, this Wan 2.1 Ditto workflow slots directly into your pipeline for dependable, flicker‑free video styling.
The workflow runs in four stages: load models, prepare the input video, encode text and visuals, then sample and export. Groups operate in sequence to produce both a stylized render and an optional side‑by‑side comparison.
This group prepares everything Wan 2.1 Ditto needs. The base backbone is loaded with WanVideoModelLoader
(#130) and paired with the WanVideoVAELoader
(#60) and LoadWanVideoT5TextEncoder
(#80). The Ditto component is selected with WanVideoVACEModelSelect
(#128), which points the backbone to the dedicated Ditto stylization weights. If you need a stronger transformation, you can attach a LoRA with WanVideoLoraSelect
(#122). WanVideoBlockSwap
(#68) is available for memory management so larger models can run smoothly on limited VRAM.
Load your source clip with VHS_LoadVideo
(#101). The frames are then resized for consistent geometry using LayerUtility: ImageScaleByAspectRatio V2
(#76), which preserves aspect while targeting a long‑side resolution controlled by a simple integer input JWInteger
(#89). GetImageSizeAndCount
(#65) reads the prepared frames and forwards width, height, and frame count to downstream nodes so Wan 2.1 Ditto samples the correct spatial size and duration. A small prompt helper CR Text
(#104) is included if you prefer to author the prompt in its own field. The group titled “Maximum Variation Limit” reminds you to keep the long‑side pixel target in a practical range for consistent results and stable memory use.
Conditioning happens in two parallel lanes. WanVideoTextEncode
(#111) turns your prompt into text embeddings that define the intent and style. WanVideoVACEEncode
(#126) encodes the prepared video into visual embeddings that preserve structure and motion for editing. An optional guidance module WanVideoSLG
(#129) controls how the model balances style and content through the denoising trajectory. WanVideoSampler
(#119) then fuses the Wan 2.1 backbone with Ditto, the text embeddings, and the visual embeddings to generate stylized latents. Finally, WanVideoDecode
(#87) reconstructs frames from latents to produce the stylized sequence with the temporal consistency that Wan 2.1 Ditto is known for.
The primary export uses VHS_VideoCombine
(#95) to save the Wan 2.1 Ditto render at your selected frame rate. For quick review, the graph joins original and stylized frames using ImageConcatMulti
(#94), sizes the comparison with ImageScaleToTotalPixels
(#133), and writes a side‑by‑side movie via VHS_VideoCombine
(#100). You will typically get two videos in the output folder: a clean stylized render and a comparison clip that helps stakeholders approve or iterate faster.
You can begin with short, clear prompts and iterate. Examples that work well with Wan 2.1 Ditto:
WanVideoVACEModelSelect
(#128)
Choose which Ditto weights to use for stylization. The default global Ditto model is a balanced choice for most footage. If your goal is anime‑to‑real conversion, select the sim‑to‑real Ditto variant referenced in the node note. Switching Ditto variants changes the character of the restyle without touching other settings.
WanVideoVACEEncode
(#126)
Builds the visual conditioning from your input frames. The key controls are width
, height
, and num_frames
, which should match the prepared video for best results. Use strength
to adjust how assertively Ditto’s style influences the edit, and vace_start_percent
and vace_end_percent
to limit when conditioning applies across the diffusion trajectory. Enable tiled_vae
on very large resolutions to reduce memory pressure.
WanVideoTextEncode
(#111)
Encodes positive and negative prompts via the mT5‑XXL encoder to guide style and content. Keep positive prompts concise and descriptive, and use negatives to suppress artifacts such as flicker or over‑saturation. The force_offload
and device
options let you trade speed for memory if you are running large models.
WanVideoSampler
(#119)
Runs the Wan 2.1 backbone with Ditto stylization to generate the final latents. The most impactful settings are steps
, cfg
, scheduler
, and seed
. Use denoise_strength
when you want to preserve more of the original structure, and keep slg_args
connected to balance content fidelity against style strength. Increasing steps or guidance may improve detail at the cost of time.
ImageScaleByAspectRatio V2
(#76)
Sets a stable target size for all frames before conditioning. Drive the long‑side target with the standalone integer so you can test small, fast previews and then increase resolution for final renders. Keep the scale consistent between iterations to make A/B comparisons meaningful.
VHS_LoadVideo
(#101) and VHS_VideoCombine
(#95, #100)
These nodes handle decoding and encoding. Match frame rates to the source when you care about timing. The comparison writer is useful during exploration and can be disabled for final exports if you only want the stylized result.
WanVideoVACEModelSelect
before sampling.This Wan 2.1 Ditto workflow makes high‑quality video restyling predictable and fast, with clean prompts, coherent motion, and outputs ready for immediate review or delivery.
This workflow implements and builds upon the following works and resources. We gratefully acknowledge EzioBy for Wan 2.1 Ditto Source for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.