Qwen Image Layered: Precision Image-to-Image Layer Decomposition

qwen/qwen-image-layered

Convert any image into precise, multi-object RGBA layers for transparent, non-destructive edits, streamlining brand design, layout prototyping, and VFX workflows with fast, API-ready control.

Idle

The rate is $0.05 per image.

Introduction To Qwen Image Layered Features

Qwen's Qwen Image Layered converts a single RGB image into multiple semantically disentangled RGBA layers at $0.05 per image, enabling variable-length, transparency-aware decomposition for precise, non-destructive edits. Trading manual masking and layer rebuilds for inherent, per-object control with RGBA transparency, Qwen Image Layered streamlines workflows by eliminating complex selections and rework, built for design leads, marketing teams, and VFX/animation studios. For developers, Qwen Image Layered on RunComfy can be used both in the browser and via an HTTP API, so you don’t need to host or scale the model yourself.
Ideal for: Pixel-accurate Object Isolation | Rapid Layout Prototyping | Brand Asset Integration

Examples Created Using Qwen Image Layered

Model Overview

Provider: Alibaba / Tongyi Qianwen team
Task: Image Generation with Layer Decomposition
Max Resolution/Duration: Not specified
Summary: Qwen Image Layered is a breakthrough generative model that brings "Photoshop-level" editability to AI generation. Unlike traditional models that output flat raster images, Qwen Image Layered decomposes the generation process into physically isolated RGBA layers based on semantic structure. It enables "Native Editability," allowing technical artists and designers to generate structured assets where subjects, backgrounds, and elements are already separated, ready for immediate moving, recoloring, or animating in downstream workflows.

Key Capabilities

PS-Level Professional Layer Management

Physical RGBA Isolation: The model splits the image into distinct physical RGBA layers rather than just predicting masks. This achieves true "Native Editability," allowing you to treat the output like a PSD file where every element is an independent object.
Clean Transparency: Delivers production-ready alpha channels with minimal color spill, making it effortless to composite generated elements onto new backgrounds or use them in UI/UX and game asset pipelines.

Prompt-Controlled Structured Layout

Explicit Layer Definition: Users can use prompts to explicitly control the structure of the image, specifying anywhere from 3 to 10 layers. You can define the decomposition logic from macro composition to micro details.
Semantic Precision: The model understands complex spatial relationships, allowing you to dictate exactly which elements go onto which layer via text instructions (e.g., "Layer 1: Sky, Layer 2: Mountains, Layer 3: Hiker").

Deep Recursive Decomposition ("Onion Peeling")

Infinite Detail Editing: The model supports a "peeling the onion" approach to decomposition. You can take a single generated layer and decompose it further into sub-layers, unlocking arbitrary depths of detail.
Granular Control: This allows for extreme precision, enabling users to isolate tiny sub-components (like a specific accessory on a character) for targeted editing without affecting the parent layer.

Input Parameters

Core Inputs

Parameter	Type	Default/Range	Description
prompt	string	Default: ""	Text guiding the image content AND the layer structure. Explicitly describe layers for best control.
image_url	string	Required	URL of the input RGB image (if performing decomposition on existing media).
negative_prompt	string	Default: ""	Elements to exclude from the generation.

Generation Controls

Parameter	Type	Default/Range	Description
num_layers	integer	Default: 4; Range: 3-10	The target number of RGBA layers to generate. Supports explicit decomposition depth from 3 to 10.
num_inference_steps	integer	Default: 28	Diffusion steps. Higher values (e.g., 50) allow for more refined edge details and alpha transparency.
guidance_scale	float	Default: 5	How strictly the model follows the prompt.
seed	integer	Default: 0	Set a fixed seed for reproducible layer structures.

Output & System

Parameter	Type	Default/Range	Description
output_format	string	Options: png, webp (Default: png)	PNG is highly recommended to preserve the transparency (Alpha channel) of layers.

API Integration

Developers can integrate Qwen Image Layered via the RunComfy API. The API accepts a prompt and/or source image and returns a JSON object containing a list of RGBA image URLs (the layer stack). This is ideal for building automated design tools, canvas editors, or VFX pipelines that require layered assets.

Note: API Endpoint for Qwen Image Layered

Official resources and licensing

Official Blog: Qwen.ai Blog
Technical Report: Arxiv Paper
GitHub Repository: QwenLM/Qwen-Image-Layered
Hugging Face: Model Page | Demo Space
ModelScope: Model Page | Demo Studio
License: Apache-2.0. Commercial use is allowed under standard open-source terms.

Explore Related Capabilities

If you need to generate high-quality flat images without layer separation, try the standard Qwen-Image model. If you need to edit specific regions of a flat image using mask-based instructions, look for Qwen-Image-Edit-2509.

Related Models

seedream-4-0/edit

Edit and fuse images into high quality results with Seedream 4.0.

ideogram-v3/reframe

Change an image’s aspect ratio cleanly with Ideogram 3 Reframe.

seedream-4-5/text-to-image

Generate refined visuals with accurate lighting and text control for design work.

z-image/turbo/image-to-image

High-speed image transformation with precision lighting and bilingual prompt support.

z-image/turbo/inpainting

Precision-driven tool for photo retouching and visual reconstruction

flux-2/turbo/edit

Delivers refined image remastering and brand-consistent visual edits with scalable control.

Frequently Asked Questions

What makes Qwen Image Layered unique compared to other image-to-image models?

Qwen Image Layered differs from typical image-to-image systems because it decomposes a single input into multiple semantically coherent RGBA layers. Each layer can be edited independently—offering native transparency and structure awareness—making it especially useful for professional post-production and 3D pipeline integration.

Can I use Qwen Image Layered outputs commercially for image-to-image edits in client projects?

Yes, Qwen Image Layered outputs can be used commercially depending on the model’s official license. Users should review Alibaba’s Qwen Image Layered license terms before deploying in commercial workflows, especially when performing image-to-image transformations that modify client intellectual property.

What are the technical limitations of Qwen Image Layered in image-to-image mode?

When using Qwen Image Layered in image-to-image operations, the current maximum resolution is around 4096×4096 pixels (≈16MP), with output scaling dependent on VRAM availability. It typically supports up to 8 semantic layers, and prompt tokens are capped at 1,024.

Does Qwen Image Layered limit the number of reference inputs such as ControlNet or IP-Adapter connections during image-to-image processing?

Yes. Qwen Image Layered currently supports up to two external reference channels—either ControlNet or IP-Adapter—when performing image-to-image refinement. Beyond that, the cross-layer attention matrix may degrade, affecting layer separation quality.

How do I move from testing Qwen Image Layered in RunComfy Playground to using it in production via API?

To transition Qwen Image Layered from the RunComfy Playground to production, generate and test your layered image-to-image workflow interactively first. Then use the RunComfy REST API with the same model identifier (e.g., 'qwen-image-layered-v1'). Authentication uses token-based headers with consumption billed in usd. Documentation is available under the API section of your RunComfy dashboard.

What model backbone powers Qwen Image Layered and how does it influence image-to-image quality?

Qwen Image Layered uses the VLD-MMDiT backbone with Layer3D RoPE positional embeddings. This design ensures cross-layer contextual awareness, producing smoother image-to-image decompositions where objects remain intact and semantically separated, particularly for complex visual structures.

How does Qwen Image Layered achieve fine transparency and separation during image-to-image decomposition?

The model employs an RGBA-VAE, providing shared latent space for RGB and RGBA data. During image-to-image inference, this prevents mismatched transparency extraction and ensures that each semantic component receives a consistent alpha mask for precise downstream editing.

Does Qwen Image Layered maintain text readability and semantic fidelity when performing image-to-image edits?

Yes. One of Qwen Image Layered’s strengths in image-to-image reconstruction is its strong text preservation. The layer decomposition mechanism isolates text regions, allowing independent font, color, or style adjustments without bleeding into background layers.

What are the main use cases for Qwen Image Layered in image-to-image creative workflows?

Qwen Image Layered is ideal for designers, game studios, and VFX teams needing transparent semantic layers. Typical image-to-image applications include visual branding reworks, UI component isolation, background replacement, and color variation generation—tasks that benefit from nondestructive layer control.

How does Qwen Image Layered compare to the earlier Qwen Image Edit model for image-to-image editing?

While Qwen Image Edit offers local region modifications on a single flat image, Qwen Image Layered provides full semantic decomposition across multiple RGBA layers. This gives superior control for iterative image-to-image refinement, reducing artifacts and making compositing workflows far more efficient.

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Qwen Image Layered: Precision Image-to-Image Layer Decomposition | RunComfy