logo
RunComfy
  • Playground
  • ComfyUI
  • TrainerNew
  • API
  • Pricing
discord logo
PLAYGROUND
Explore
All Models
Lipsync Studio
Character Swap
Upscale Video
LIBRARY
Generations
MODEL APIS
API Docs
API Keys
ACCOUNT
Usage

Qwen Image Layered: Precision Image-to-Image Layer Decomposition on playground and API | RunComfy

qwen/qwen-image-layered

Convert any image into precise, multi-object RGBA layers for transparent, non-destructive edits, streamlining brand design, layout prototyping, and VFX workflows with fast, API-ready control.

The URL of the input image.
The negative prompt to generate an image from.
The number of inference steps to perform.
The guidance scale to use for the image generation.
The number of layers to generate.
The format of the generated image.
Idle
The rate is $0.05 per image.

Introduction To Qwen Image Layered Features

Qwen's Qwen Image Layered converts a single RGB image into multiple semantically disentangled RGBA layers at $0.05 per image, enabling variable-length, transparency-aware decomposition for precise, non-destructive edits. Trading manual masking and layer rebuilds for inherent, per-object control with RGBA transparency, Qwen Image Layered streamlines workflows by eliminating complex selections and rework, built for design leads, marketing teams, and VFX/animation studios. For developers, Qwen Image Layered on RunComfy can be used both in the browser and via an HTTP API, so you don’t need to host or scale the model yourself.
Ideal for: Pixel-accurate Object Isolation | Rapid Layout Prototyping | Brand Asset Integration

Examples Created Using Qwen Image Layered

Model Overview


  • Provider: Alibaba / Tongyi Qianwen team
  • Task: Image Generation with Layer Decomposition
  • Max Resolution/Duration: Not specified
  • Summary: Qwen Image Layered is a breakthrough generative model that brings "Photoshop-level" editability to AI generation. Unlike traditional models that output flat raster images, Qwen Image Layered decomposes the generation process into physically isolated RGBA layers based on semantic structure. It enables "Native Editability," allowing technical artists and designers to generate structured assets where subjects, backgrounds, and elements are already separated, ready for immediate moving, recoloring, or animating in downstream workflows.

Key Capabilities


PS-Level Professional Layer Management

  • Physical RGBA Isolation: The model splits the image into distinct physical RGBA layers rather than just predicting masks. This achieves true "Native Editability," allowing you to treat the output like a PSD file where every element is an independent object.
  • Clean Transparency: Delivers production-ready alpha channels with minimal color spill, making it effortless to composite generated elements onto new backgrounds or use them in UI/UX and game asset pipelines.

Prompt-Controlled Structured Layout

  • Explicit Layer Definition: Users can use prompts to explicitly control the structure of the image, specifying anywhere from 3 to 10 layers. You can define the decomposition logic from macro composition to micro details.
  • Semantic Precision: The model understands complex spatial relationships, allowing you to dictate exactly which elements go onto which layer via text instructions (e.g., "Layer 1: Sky, Layer 2: Mountains, Layer 3: Hiker").

Deep Recursive Decomposition ("Onion Peeling")

  • Infinite Detail Editing: The model supports a "peeling the onion" approach to decomposition. You can take a single generated layer and decompose it further into sub-layers, unlocking arbitrary depths of detail.
  • Granular Control: This allows for extreme precision, enabling users to isolate tiny sub-components (like a specific accessory on a character) for targeted editing without affecting the parent layer.

Input Parameters


Core Inputs


ParameterTypeDefault/RangeDescription
promptstringDefault: ""Text guiding the image content AND the layer structure. Explicitly describe layers for best control.
image_urlstringRequiredURL of the input RGB image (if performing decomposition on existing media).
negative_promptstringDefault: ""Elements to exclude from the generation.

Generation Controls


ParameterTypeDefault/RangeDescription
num_layersintegerDefault: 4; Range: 3-10The target number of RGBA layers to generate. Supports explicit decomposition depth from 3 to 10.
num_inference_stepsintegerDefault: 28Diffusion steps. Higher values (e.g., 50) allow for more refined edge details and alpha transparency.
guidance_scalefloatDefault: 5How strictly the model follows the prompt.
seedintegerDefault: 0Set a fixed seed for reproducible layer structures.

Output & System


ParameterTypeDefault/RangeDescription
output_formatstringOptions: png, webp (Default: png)PNG is highly recommended to preserve the transparency (Alpha channel) of layers.

API Integration


Developers can integrate Qwen Image Layered via the RunComfy API. The API accepts a prompt and/or source image and returns a JSON object containing a list of RGBA image URLs (the layer stack). This is ideal for building automated design tools, canvas editors, or VFX pipelines that require layered assets.


Note: API Endpoint for Qwen Image Layered


Official resources and licensing


  • Official Blog: Qwen.ai Blog
  • Technical Report: Arxiv Paper
  • GitHub Repository: QwenLM/Qwen-Image-Layered
  • Hugging Face: Model Page | Demo Space
  • ModelScope: Model Page | Demo Studio
  • License: Apache-2.0. Commercial use is allowed under standard open-source terms.

Explore Related Capabilities


If you need to generate high-quality flat images without layer separation, try the standard Qwen-Image model. If you need to edit specific regions of a flat image using mask-based instructions, look for Qwen-Image-Edit-2509.

Related Playgrounds

z-image/turbo/text-to-image

High-speed model for rapid text-to-image creation with rich detail and flexible format control.

flux-2/turbo/text-to-image

Create detailed visual assets from prompts with scalable, high-speed precision

gemini-3-pro-image-preview/edit

Generate studio-grade visuals with 4K clarity, creative control, and smart adaptive lighting

dreamina-4-0/text-to-image

Next-gen AI visual tool merging text-driven image creation with precision editing.

qwen-edit-2509/multi-image-edit-plus

Advanced image editing model for detailed, consistent visual creation and precise design workflows.

flux-2/flex/text-to-image

Generate accurate brand visuals with high-fidelity text-to-image control.

Frequently Asked Questions

What makes Qwen Image Layered unique compared to other image-to-image models?

Qwen Image Layered differs from typical image-to-image systems because it decomposes a single input into multiple semantically coherent RGBA layers. Each layer can be edited independently—offering native transparency and structure awareness—making it especially useful for professional post-production and 3D pipeline integration.

Can I use Qwen Image Layered outputs commercially for image-to-image edits in client projects?

Yes, Qwen Image Layered outputs can be used commercially depending on the model’s official license. Users should review Alibaba’s Qwen Image Layered license terms before deploying in commercial workflows, especially when performing image-to-image transformations that modify client intellectual property.

What are the technical limitations of Qwen Image Layered in image-to-image mode?

When using Qwen Image Layered in image-to-image operations, the current maximum resolution is around 4096×4096 pixels (≈16MP), with output scaling dependent on VRAM availability. It typically supports up to 8 semantic layers, and prompt tokens are capped at 1,024.

Does Qwen Image Layered limit the number of reference inputs such as ControlNet or IP-Adapter connections during image-to-image processing?

Yes. Qwen Image Layered currently supports up to two external reference channels—either ControlNet or IP-Adapter—when performing image-to-image refinement. Beyond that, the cross-layer attention matrix may degrade, affecting layer separation quality.

How do I move from testing Qwen Image Layered in RunComfy Playground to using it in production via API?

To transition Qwen Image Layered from the RunComfy Playground to production, generate and test your layered image-to-image workflow interactively first. Then use the RunComfy REST API with the same model identifier (e.g., 'qwen-image-layered-v1'). Authentication uses token-based headers with consumption billed in usd. Documentation is available under the API section of your RunComfy dashboard.

What model backbone powers Qwen Image Layered and how does it influence image-to-image quality?

Qwen Image Layered uses the VLD-MMDiT backbone with Layer3D RoPE positional embeddings. This design ensures cross-layer contextual awareness, producing smoother image-to-image decompositions where objects remain intact and semantically separated, particularly for complex visual structures.

How does Qwen Image Layered achieve fine transparency and separation during image-to-image decomposition?

The model employs an RGBA-VAE, providing shared latent space for RGB and RGBA data. During image-to-image inference, this prevents mismatched transparency extraction and ensures that each semantic component receives a consistent alpha mask for precise downstream editing.

Does Qwen Image Layered maintain text readability and semantic fidelity when performing image-to-image edits?

Yes. One of Qwen Image Layered’s strengths in image-to-image reconstruction is its strong text preservation. The layer decomposition mechanism isolates text regions, allowing independent font, color, or style adjustments without bleeding into background layers.

What are the main use cases for Qwen Image Layered in image-to-image creative workflows?

Qwen Image Layered is ideal for designers, game studios, and VFX teams needing transparent semantic layers. Typical image-to-image applications include visual branding reworks, UI component isolation, background replacement, and color variation generation—tasks that benefit from nondestructive layer control.

How does Qwen Image Layered compare to the earlier Qwen Image Edit model for image-to-image editing?

While Qwen Image Edit offers local region modifications on a single flat image, Qwen Image Layered provides full semantic decomposition across multiple RGBA layers. This gives superior control for iterative image-to-image refinement, reducing artifacts and making compositing workflows far more efficient.

Follow us
  • LinkedIn
  • Facebook
  • Instagram
  • Twitter
Support
  • Discord
  • Email
  • System Status
  • Affiliate
Video Models/Tools
  • Wan 2.6
  • Wan 2.6 Text to Video
  • Veo 3.1 Fast Video Extend
  • Seedance Lite
  • Wan 2.2
  • Seedance 1.0 Pro Fast
  • View All Models →
Image Models
  • GPT Image 1.5 Image to Image
  • Flux 2 Max Edit
  • GPT Image 1.5 Text To Image
  • Gemini 3 Pro
  • seedream 4.0
  • Nano Banana Pro
  • View All Models →
Legal
  • Terms of Service
  • Privacy Policy
  • Cookie Policy
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.