ComfyUI>Workflows>Wan Alpha | Transparent Video Generator

Wan Alpha | Transparent Video Generator

Workflow Name: RunComfy/Wan-Alpha

Workflow ID: 0000...1295

This workflow helps you generate high-quality video clips with built-in transparency, eliminating the need for manual masking or green screen removal. You can easily place characters, objects, or animated elements over any background of your choice. Its output is highly flexible, making it perfect for VFX, motion graphics, and compositing work. You will save time and streamline your creative process by working with ready-to-use transparent video assets. Designed for professionals, it ensures seamless integration into existing design pipelines. Built on the Wan 2.1 model, it delivers fidelity and precision with each render.

Wan Alpha: transparent text-to-video for pro compositing

Wan Alpha is a purpose-built ComfyUI workflow that generates videos with a native alpha channel using the Wan 2.1 family. It jointly produces RGB and alpha so characters, props, and effects drop straight into timelines without keying or rotoscoping. For VFX, motion graphics, and interactive apps, Wan Alpha delivers clean edges, semi‑transparent effects, and frame-accurate masks ready for production.

Built around Wan2.1‑T2V‑14B and an alpha-aware VAE pair, Wan Alpha balances fidelity and speed. Optional LightX2V LoRA acceleration shortens sampling while preserving detail, and the workflow exports RGBA frame sequences plus an animated WebP preview for quick review.

Key models in Comfyui Wan Alpha workflow

Wan2.1‑T2V‑14B. Foundation text-to-video model that drives scene structure, motion, and rendering quality. Official weights and code are maintained in the Wan-Video organization on GitHub. Wan-Video/Wan2.1
UMT5‑XXL text encoder. Multilingual encoder used to tokenize and embed prompts for Wan models, enabling rich prompt phrasing in multiple languages. google/umt5-xxl and UMT5 docs
Wan‑Alpha VAE pair. A VAE design that learns RGB and alpha jointly so decoded alpha aligns pixel-for-pixel with RGB, supporting fine edges and semi-transparency. See the Wan‑Alpha technical report for background. Wan‑Alpha (arXiv)
LightX2V LoRA. Optional acceleration LoRA that distills long samplers to a few steps for faster text-to-video while maintaining perceptual quality. ModelTC/LightX2V

How to use Comfyui Wan Alpha workflow

This ComfyUI graph follows a straightforward path from prompt to RGBA frames: load models, encode text, allocate a video latent, sample, decode RGB and alpha in lockstep, then save.

Model and LoRA loading

Start at Load Wan 2.1 t2v 14B (#37) to bring in the base model. If you use acceleration or style refinements, apply them with LoraLoaderModelOnly (#59) and LoraLoaderModelOnly (#65) in sequence. The model then passes through ModelSamplingSD3 (#48), which configures a sampler compatible with the loaded checkpoint. This stack defines the motion prior and rendering style that Wan Alpha will refine in later steps.

Prompt encoding

Load Text Encoder (#38) loads the UMT5‑XXL text encoder. Enter your description in CLIP Text Encode (Positive Prompt) (#6); keep your subject, action, camera framing, and the phrase “transparent background” concise. Use CLIP Text Encode (Negative Prompt) Useless s (#7) to steer away from halos or background clutter if needed. These encodings condition both RGB and alpha generation so edges and transparency cues follow your intent.

Video canvas setup

Use EmptyHunyuanLatentVideo (#40) to define the latent video canvas. Set width, height, frames, and fps to fit your shot; higher resolutions or longer clips require more memory. This node allocates a temporally consistent latent volume that Wan Alpha will fill with motion and appearance. Consider matching duration and frame rate to your edit to avoid resampling later.

Generation

The KSampler (#3) performs diffusion on the video latent using your model stack and prompt conditioning. Adjust seed for variations, and select a sampler and scheduler that balance speed and detail. When the LightX2V LoRA is active, you can use fewer steps for faster renders while maintaining stability. The output is a single latent stream shared by the next decoding stage to guarantee perfect RGBA alignment.

Decoding RGB and alpha

RGB VAE Decode (#8) pairs with VAELoader (#39) to reconstruct RGB frames. In parallel, Alpha VAE Decode (#52) pairs with VAELoader (#51) to reconstruct the alpha channel. Both decoders read the same latent so the matte aligns exactly with the color pixels, a core idea in Wan‑Alpha’s design for consistent transparency. This dual-path decode is what makes Wan Alpha ready for direct compositing.

Saving and previewing

SavePNGZIP_and_Preview_RGBA_AnimatedWEBP (#73) writes two deliverables: a zip archive of RGBA PNG frames and a compact animated WebP preview. The frame sequence is production friendly for NLEs and compositors, while the preview accelerates reviews. Name your output set, choose a preview length and quality, and run the node to package your result.

Key nodes in Comfyui Wan Alpha workflow

EmptyHunyuanLatentVideo (#40)

Role: defines spatial and temporal resolution of the generated clip. Tune width, height, frames, and fps to match delivery. Larger canvases and longer durations raise VRAM needs; consider shorter drafts for look development, then scale up for finals.

KSampler (#3)

Role: the main denoiser for Wan Alpha. Adjust seed for explorations, steps to trade speed for detail, sampler and scheduler for stability, and cfg to balance prompt adherence with natural motion. With LightX2V LoRA active, you can reduce steps significantly while preserving quality thanks to step-distillation. See LightX2V for context on fast sampling. ModelTC/LightX2V

LoraLoaderModelOnly (#59)

Role: loads the LightX2V LoRA that accelerates Wan2.1 sampling. Use the strength control to blend its effect if you see oversharpening or tempo artifacts. Keep this LoRA closest to the base model in the chain so downstream LoRAs inherit its speed benefits.

LoraLoaderModelOnly (#65)

Role: loads an additional LoRA for style or domain refinement. Moderate strength to avoid overpowering motion coherence; combine with your prompt rather than replacing it. If artifacts appear, lower this LoRA before changing the sampler.

VAELoader (#39) RGB

Role: provides the RGB VAE used by RGB VAE Decode (#8). Keep this paired with the Wan‑Alpha alpha VAE to ensure both decoders interpret latents coherently. Swapping to unrelated VAEs can misalign edges or soften transparency. Background on the joint RGB–alpha design is in the Wan‑Alpha report. Wan‑Alpha (arXiv)

VAELoader (#51) Alpha

Role: provides the alpha VAE used by Alpha VAE Decode (#52). It reconstructs the matte from the same latent space as RGB so transparency matches motion and detail. If you customize VAEs, test that RGB and alpha still align on subpixel edges like hair.

SavePNGZIP_and_Preview_RGBA_AnimatedWEBP (#73)

Role: exports assets. Set a clear output_name for versioning, choose preview quality and frame rate that reflect the generated clip, and keep the PNG export as your master for lossless compositing. Avoid resizing between decode and save to preserve edge fidelity.

Optional extras

Strong prompts for Wan Alpha explicitly describe subject, action, camera, lighting, and “transparent background.” Add subtle materials like “wispy hair” or “glass” to exercise alpha detail.
For quick iteration, use shorter durations or lower frame rates, then upscale settings once look and motion are locked.
If you see halos, add negatives like “background, outline, green screen, white fringe” and keep lighting consistent in the prompt.
When combining multiple LoRAs, place acceleration LoRAs earlier and stylistic LoRAs later, and keep strengths modest to retain motion realism.
Import the RGBA PNG sequence directly into your compositor; use the animated WebP only for previews, not as a master.

Resources used in Wan Alpha

Wan2.1 model family and code: Wan-Video/Wan2.1
UMT5 text encoder: google/umt5-xxl and UMT5 docs
Wan‑Alpha method overview: Wan‑Alpha (arXiv)
LightX2V acceleration: ModelTC/LightX2V

Acknowledgements

This workflow implements and builds upon the following works and resources. We gratefully acknowledge WeChatCV for Wan-Alpha for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.

Resources

WeChatCV/Wan-Alpha
- GitHub: WeChatCV/Wan-Alpha

Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.

Want More ComfyUI Workflows?

Wan 2.1 Control LoRA | Depth and Tile

Advance Wan 2.1 video generation with lightweight depth and tile LoRAs for improved structure and detail.

Wan 2.1 Video Restyle | Consistent Video Style Transform

Transform your video style by applying the restyled first frame using Wan 2.1 video restyle workflow.

Wan 2.1 LoRA

Enhance Wan 2.1 video generation with LoRA models for improved style and customization.

Wan 2.1 | Revolutionary Video Generation

Create incredible videos from text or images with breakthrough AI running on everyday CPUs.

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

InfiniteYou | Identity-Preserving Face Generation

Dual-mode identity-preserving generation with Face Combine and Zero-Shot workflows using InfiniteYou.

Stable Fast 3D | ComfyUI 3D Pack

Create stunning 3D content with Stable Fast 3D and ComfyUI 3D Pack.

Qwen-Image Lightning | 8-Step Speed Boost

Cut generation time in half.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.