Wan Alpha is a purpose-built ComfyUI workflow that generates videos with a native alpha channel using the Wan 2.1 family. It jointly produces RGB and alpha so characters, props, and effects drop straight into timelines without keying or rotoscoping. For VFX, motion graphics, and interactive apps, Wan Alpha delivers clean edges, semi‑transparent effects, and frame-accurate masks ready for production.
Built around Wan2.1‑T2V‑14B and an alpha-aware VAE pair, Wan Alpha balances fidelity and speed. Optional LightX2V LoRA acceleration shortens sampling while preserving detail, and the workflow exports RGBA frame sequences plus an animated WebP preview for quick review.
This ComfyUI graph follows a straightforward path from prompt to RGBA frames: load models, encode text, allocate a video latent, sample, decode RGB and alpha in lockstep, then save.
Model and LoRA loading
Load Wan 2.1 t2v 14B
(#37) to bring in the base model. If you use acceleration or style refinements, apply them with LoraLoaderModelOnly
(#59) and LoraLoaderModelOnly
(#65) in sequence. The model then passes through ModelSamplingSD3
(#48), which configures a sampler compatible with the loaded checkpoint. This stack defines the motion prior and rendering style that Wan Alpha will refine in later steps.Prompt encoding
Load Text Encoder
(#38) loads the UMT5‑XXL text encoder. Enter your description in CLIP Text Encode (Positive Prompt)
(#6); keep your subject, action, camera framing, and the phrase “transparent background” concise. Use CLIP Text Encode (Negative Prompt) Useless s
(#7) to steer away from halos or background clutter if needed. These encodings condition both RGB and alpha generation so edges and transparency cues follow your intent.Video canvas setup
EmptyHunyuanLatentVideo
(#40) to define the latent video canvas. Set width
, height
, frames
, and fps
to fit your shot; higher resolutions or longer clips require more memory. This node allocates a temporally consistent latent volume that Wan Alpha will fill with motion and appearance. Consider matching duration and frame rate to your edit to avoid resampling later.Generation
KSampler
(#3) performs diffusion on the video latent using your model stack and prompt conditioning. Adjust seed
for variations, and select a sampler
and scheduler
that balance speed and detail. When the LightX2V LoRA is active, you can use fewer steps for faster renders while maintaining stability. The output is a single latent stream shared by the next decoding stage to guarantee perfect RGBA alignment.Decoding RGB and alpha
RGB VAE Decode
(#8) pairs with VAELoader
(#39) to reconstruct RGB frames. In parallel, Alpha VAE Decode
(#52) pairs with VAELoader
(#51) to reconstruct the alpha channel. Both decoders read the same latent so the matte aligns exactly with the color pixels, a core idea in Wan‑Alpha’s design for consistent transparency. This dual-path decode is what makes Wan Alpha ready for direct compositing.Saving and previewing
SavePNGZIP_and_Preview_RGBA_AnimatedWEBP
(#73) writes two deliverables: a zip archive of RGBA PNG frames and a compact animated WebP preview. The frame sequence is production friendly for NLEs and compositors, while the preview accelerates reviews. Name your output set, choose a preview length and quality, and run the node to package your result.EmptyHunyuanLatentVideo
(#40)
width
, height
, frames
, and fps
to match delivery. Larger canvases and longer durations raise VRAM needs; consider shorter drafts for look development, then scale up for finals.KSampler
(#3)
seed
for explorations, steps
to trade speed for detail, sampler
and scheduler
for stability, and cfg
to balance prompt adherence with natural motion. With LightX2V LoRA active, you can reduce steps
significantly while preserving quality thanks to step-distillation. See LightX2V for context on fast sampling. ModelTC/LightX2VLoraLoaderModelOnly
(#59)
strength
control to blend its effect if you see oversharpening or tempo artifacts. Keep this LoRA closest to the base model in the chain so downstream LoRAs inherit its speed benefits.LoraLoaderModelOnly
(#65)
strength
to avoid overpowering motion coherence; combine with your prompt rather than replacing it. If artifacts appear, lower this LoRA before changing the sampler.VAELoader
(#39) RGB
RGB VAE Decode
(#8). Keep this paired with the Wan‑Alpha alpha VAE to ensure both decoders interpret latents coherently. Swapping to unrelated VAEs can misalign edges or soften transparency. Background on the joint RGB–alpha design is in the Wan‑Alpha report. Wan‑Alpha (arXiv)VAELoader
(#51) Alpha
Alpha VAE Decode
(#52). It reconstructs the matte from the same latent space as RGB so transparency matches motion and detail. If you customize VAEs, test that RGB and alpha still align on subpixel edges like hair.SavePNGZIP_and_Preview_RGBA_AnimatedWEBP
(#73)
output_name
for versioning, choose preview quality and frame rate that reflect the generated clip, and keep the PNG export as your master for lossless compositing. Avoid resizing between decode and save to preserve edge fidelity.Resources used in Wan Alpha
This workflow implements and builds upon the following works and resources. We gratefully acknowledge WeChatCV for Wan-Alpha for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.