logo
RunComfy
  • ComfyUI
  • TrainerNew
  • Models
  • API
  • Pricing
discord logo
ComfyUI>Workflows>Z-Image | Fast Photorealistic Base Model

Z-Image | Fast Photorealistic Base Model

Workflow Name: RunComfy/Z-Image
Workflow ID: 0000...1357
With this workflow, you can produce exceptionally detailed, true-to-life visuals in seconds. It lets you fine-tune lighting, tone, and composition while maintaining strong prompt accuracy. The integrated architecture supports smooth bilingual text rendering for design consistency. Whether you need high-speed drafts or refined final artwork, it adapts to your workflow. Use it for creating marketing visuals, product renders, or concept art without technical complexity. Experience scalable image generation that balances speed, quality, and control.

Z-Image text-to-image workflow for ComfyUI

This ComfyUI workflow showcases Z-Image, a next‑generation diffusion transformer designed for fast, high‑fidelity image generation. Built on a scalable single‑stream architecture with around 6 billion parameters, Z-Image balances photorealism, strong prompt adherence, and bilingual text rendering.

Out of the box, the graph is set up for Z-Image Base to maximize quality while remaining efficient on common GPUs. It also plays well with the Z-Image Turbo variant when speed matters, and its structure makes it easy to extend toward Z-Image Edit for image‑to‑image tasks. If you want a dependable, minimal graph that turns clear prompts into clean results, this Z-Image workflow is a solid starting point.

Key models in Comfyui Z-Image workflow

  • Z-Image Base diffusion transformer (bf16). Core generator that denoises latents into images with Z-Image’s single‑stream topology and prompt control. Model page • bf16 weights
  • Qwen 3 4B text encoder. Encodes prompts for Z-Image with strong bilingual coverage and clear tokenization for text rendering. encoder weights
  • Z-Image autoencoder VAE. Compresses and reconstructs images between pixel space and the Z-Image latent space. VAE weights

How to use Comfyui Z-Image workflow

At a high level, the graph loads Z-Image components, prepares a latent canvas, encodes your positive and negative prompts, runs a sampler tuned for Z-Image, then decodes and saves the result. You mainly provide the prompt and choose the output size; the rest is wired for sensible defaults.

Step1 - Load models

This group initializes the Z-Image UNet, the Qwen 3 4B text encoder, and the VAE so all components align. The UNETLoader (#66) points to Z-Image Base by default, which favors fidelity and editing headroom. The CLIPLoader (#62) brings in the Qwen-based encoder that handles multilingual prompts and text tokens well. The VAELoader (#63) sets the autoencoder used later for decoding. Swap weights here if you want to try Z-Image Turbo for faster drafts.

Step2 - Image size

This group sets up the latent canvas via EmptySD3LatentImage (#68). Choose the width and height you want to generate at, and keep aspect ratio in mind for composition. Z-Image performs well across common creative sizes, so pick dimensions that match your storyboards or delivery format. Larger sizes increase detail and compute cost.

Step3 - Prompt

Here you write your story. The CLIP Text Encode (Positive Prompt) (#67) node takes your scene description and style directives for Z-Image. The CLIP Text Encode (Negative Prompt) (#71) helps steer away from artifacts or unwanted elements. Z-Image is tuned for bilingual text rendering, so you can include text content in multiple languages directly in the prompt when needed. Keep prompts specific and visual for the most consistent outcomes.

Sample and denoise

ModelSamplingAuraFlow (#70) applies a sampling policy aligned with Z-Image’s single‑stream design, then KSampler (#69) drives the denoising process to turn noise into an image that matches your prompts. The sampler combines your positive and negative conditioning with the latent canvas to iteratively refine structure and details. You can trade speed for quality here by adjusting the sampler settings as described below. This stage is where Z-Image’s prompt adherence and text clarity really show.

Decode and save

VAEDecode (#65) converts the final latent into an RGB image. SaveImage (#9) writes files using the prefix set in the node so your Z-Image outputs are easy to find and organize. This completes a full pass from prompt to pixels.

Key nodes in Comfyui Z-Image workflow

UNETLoader (#66)

Loads the Z-Image backbone that performs the actual denoising. Swap to another Z-Image variant here when exploring speed or editing use cases. If you change variants, keep the encoder and VAE compatible to avoid color or contrast shifts.

CLIP Text Encode (Positive Prompt) (#67)

Encodes the main description for Z-Image. Write concise, visual phrases that specify subject, lighting, camera, mood, and any on‑image text. For text rendering, put the desired words in quotes and keep them short for best legibility.

CLIP Text Encode (Negative Prompt) (#71)

Defines what to avoid so Z-Image can focus on the right details. Use it to suppress blur, extra limbs, messy typography, or off‑style elements. Keep it brief and topical so it does not overconstrain composition.

EmptySD3LatentImage (#68)

Creates the latent canvas where Z-Image will paint. Choose dimensions that suit the final usage and keep them multiples of 64 px for efficient memory use. Wider or taller canvases influence composition and perspective, so adjust prompts accordingly.

ModelSamplingAuraFlow (#70)

Selects a sampler preset that matches Z-Image’s training and latent space. You rarely need to change this unless you are testing alternative samplers. Leave it as provided for stable, artifact‑free results.

KSampler (#69)

Controls the quality–speed tradeoff for Z-Image. Increase steps for more detail and stability, decrease for faster drafts. Keep cfg moderate to balance prompt adherence with natural textures; typical values in this graph are steps: 30 to 50 and cfg: 3 to 5. Set a fixed seed for reproducibility or randomize it to explore variations.

VAEDecode (#65)

Transforms the final latent from Z-Image into an RGB image. If you ever change the VAE, keep it matched to the model family to preserve color accuracy and sharpness.

SaveImage (#9)

Writes the result with a clear filename prefix so Z-Image outputs are easy to catalog. Adjust the prefix to separate experiments, model variants, or aspect ratios.

Optional extras

  • Use Z-Image Turbo for rapid ideation, then switch back to Z-Image Base and raise steps for final renders.
  • For bilingual prompts and on‑image text, keep wording short and high contrast in the prompt to help Z-Image render crisp typography.
  • Lock the seed when comparing small prompt edits so differences reflect your changes rather than new noise.
  • If you see oversaturation or halos, lower cfg slightly or strengthen the negative prompt to regain balance.

Acknowledgements

This workflow implements and builds upon the following works and resources. We gratefully acknowledge Comfy-Org for the Z-Image Day-0 ComfyUI workflow template for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.

Resources

  • Comfy-Org/Z-Image Day-0 support in ComfyUI
    • GitHub: Comfy-Org/workflow_templates
    • Docs / Release Notes: Source

Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.

Want More ComfyUI Workflows?

Wan 2.2 | Open-Source Video Gen Leader

Available now! Better precision + smoother motion.

Wan 2.2 + Lightx2v V2 | Ultra Fast I2V & T2V

Dual Light LoRA setup, 4X faster.

Wan 2.2 Image Generation | 2-in-1 Workflow Pack

MoE Mix + Low-Only with upscale. Pick one.

Qwen-Image | HD Multi-Text Poster Generator

New Era of Text Generation in Images!

Instagirl v.20 | Wan 2.2 LoRA Demo

A Wan 2.2 workflow for demoing the Instagirl LoRA by Instara.

DreamO | Unified Multi-Task Image Customization Framework

Perform identity, style, try-on, and multi-condition image generation from 1–3 references

One to All Animation | Pose-Based Video Maker

Make smooth pose-following videos with stunning motion consistency.

Consistent Style Transfer with Unsampling

Controlling latent noise with Unsampling helps dramatically increase consistency in video style transfer.

Follow us
  • LinkedIn
  • Facebook
  • Instagram
  • Twitter
Support
  • Discord
  • Email
  • System Status
  • Affiliate
Resources
  • Free ComfyUI Online
  • ComfyUI Guides
  • RunComfy API
  • ComfyUI Tutorials
  • ComfyUI Nodes
  • Learn More
Legal
  • Terms of Service
  • Privacy Policy
  • Cookie Policy
RunComfy
Copyright 2026 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.