Z‑Image is a 6B‑parameter image generation model from Tongyi‑MAI built on a Scalable Single‑Stream Diffusion Transformer (S3‑DiT). It’s unusually efficient for its size and is designed to run at 1024×1024 on consumer GPUs.
This guide covers the two most common, real-world approaches to Z‑Image LoRA training:
1) Z‑Image Turbo (w/ Training Adapter) — best when you want your LoRA to run with true 8‑step Turbo speed after training.
2) Z‑Image De‑Turbo (De‑Distilled) — best when you want a de‑distilled base you can train without an adapter, or push longer fine-tunes.
By the end of this guide, you’ll be able to:
- Pick the right Z‑Image base (Turbo+adapter vs De‑Turbo) for your goal.
- Prepare a dataset that works with Turbo-style distilled training.
- Configure Ostris AI Toolkit (locally or on RunComfy Cloud AI Toolkit) panel‑by‑panel.
- Understand why each parameter matters, so you can tune instead of copy‑pasting.
This article is part of the AI Toolkit LoRA training series. If you’re new to Ostris AI Toolkit, start with the AI Toolkit LoRA training overview before diving into this guide.
Quick start (recommended baseline)
Option A — Turbo + training adapter (recommended for most LoRAs)
Use this if you want your LoRA to keep Turbo’s fast 8‑step behavior after training.
Why this matters:
- Turbo is a distilled "student" model: it compresses a slower multi-step diffusion process into ~8 steps.
- If you train on Turbo like a normal model, your updates can undo the distillation ("Turbo drift"), and you’ll start needing more steps / more CFG to get the same quality.
- The training adapter temporarily "de‑distills" Turbo during training so your LoRA learns your concept without breaking Turbo’s 8‑step behavior. At inference you remove the adapter and keep only your LoRA.
Baseline settings:
- MODEL → Model Architecture:
Z‑Image Turbo (w/ Training Adapter) - MODEL → Name or Path:
Tongyi-MAI/Z-Image-Turbo - MODEL → Training Adapter Path:
- Keep the default if your UI auto-fills it (RunComfy often defaults to v2), or set explicitly:
- v1:
ostris/zimage_turbo_training_adapter/zimage_turbo_training_adapter_v1.safetensors - v2:
ostris/zimage_turbo_training_adapter/zimage_turbo_training_adapter_v2.safetensors - TARGET → Linear Rank:
16 - TRAINING → Learning Rate:
0.0001 - TRAINING → Steps:
2500–3000(for 10–30 images) - DATASETS → Resolutions:
512 / 768 / 1024and Cache Latents = ON - SAMPLE (for previews):
1024×1024, 8 steps (or9if your pipeline treats 9 as "8 DiT forwards")- Guidance scale = 0 (Turbo is guidance‑distilled)
- Sample every
250steps
Option B — De‑Turbo (de‑distilled base)
Use this if you want to train without a training adapter or you plan longer training runs.
What changes compared to Turbo:
- De‑Turbo behaves more like a "normal" diffusion model for training and sampling.
- You typically sample with more steps and low (but non-zero) CFG.
- MODEL → Model Architecture:
Z‑Image De‑Turbo (De‑Distilled) - MODEL → Name or Path:
ostris/Z-Image-De-Turbo(or whatever your AI Toolkit build pre-selects) - Training Adapter Path: none (not needed)
- Keep the same LoRA settings (rank/LR/steps) as a baseline.
- SAMPLE (for previews):
- 20–30 steps
- CFG (guidance scale) ≈ 2–3
- Sample every
250steps
Want zero setup? Use the RunComfy Cloud AI Toolkit and follow the exact same panels.
Table of contents
- 1. Which Z‑Image base should you train on? (Turbo+adapter vs De‑Turbo)
- 2. Z‑Image training adapter v1 vs v2 (what changes, when to use)
- 3. Z‑Image / Z‑Image‑Turbo in a nutshell (for LoRA training)
- 4. Where to train Z‑Image: local vs cloud AI Toolkit
- 5. Designing datasets for Z‑Image LoRA training
- 6. Z‑Image LoRA configuration in AI Toolkit – parameter by parameter
- 7. Practical recipes for Z‑Image LoRA training
- 8. Troubleshooting (Turbo drift, overfit, VRAM, sampling)
- 9. Export and use your Z‑Image LoRA
- FAQ
1. Which Z‑Image base should you train on? (Turbo+adapter vs De‑Turbo)
AI Toolkit exposes two "model architecture" choices for Z‑Image LoRA training:
1.1 Z‑Image Turbo (w/ Training Adapter)
Best for: typical LoRAs (character, style, product), where your end goal is to run inference on Turbo at 8 steps.
Why it exists:
- Z‑Image Turbo is a step‑distilled model. If you train LoRAs on a step‑distilled model "normally", the distillation can break down fast, and Turbo starts to behave like a slower non‑distilled model (quality shifts, needs more steps, etc.).
- The training adapter acts like a temporary "de‑distillation LoRA" during training. Your LoRA learns your concept while Turbo’s fast 8‑step behavior stays stable.
- At inference time, you remove the training adapter and keep your LoRA on top of the real Turbo base.
Practical signals you chose the right path:
- Your preview samples look good at 8 steps with guidance ≈ 0.
- Your LoRA doesn’t suddenly start requiring 20–30 steps to look clean (a common sign of Turbo drift).
1.2 Z‑Image De‑Turbo (De‑Distilled)
Best for: training without adapter, or longer fine‑tunes where Turbo+adapter would eventually drift.
What it is:
- De‑Turbo is a de‑distilled version of Turbo, designed to behave more like a normal diffusion model for training.
- It can be trained directly without an adapter and also used for inference (typically 20–30 steps with low CFG).
1.3 Quick decision guide
Pick Turbo + training adapter if:
- You want the LoRA to run at Turbo speed (8 steps) after training.
- You are doing a normal LoRA run (a few thousand to tens of thousands of steps).
Pick De‑Turbo if:
- You want "normal model" behavior for training and sampling.
- You want to train longer, or you’re experimenting with workflows that don’t support the training adapter cleanly.
2. Z‑Image training adapter v1 vs v2 (what changes, when to use)
In the training adapter repo you’ll often see two files:
..._v1.safetensors..._v2.safetensors
What you need to know (practically):
- v1 is the safe baseline.
- v2 is a newer variant that can change training dynamics and results.
Recommendation: treat this as an A/B test:
- Keep dataset, LR, steps, rank identical
- Train once with v1, once with v2
- Compare sample grids at the same checkpoints
If your RunComfy UI defaults to v2 and your training looks stable, just keep it. If you see instability (noise, Turbo drift, weird artifacts), switch to v1.
3. Z‑Image / Z‑Image‑Turbo in a nutshell (for LoRA training)
From the official Z‑Image sources:
- 6B parameters, S3‑DiT architecture — text tokens, visual semantic tokens and VAE latents are concatenated into one single transformer stream.
- Model family — Turbo, Base, and Edit variants exist in the Z‑Image series.
- Turbo specifics — optimized for fast inference; guidance is typically 0 for Turbo inference.
A helpful mental model for LoRA training:
- High-noise timesteps mostly control composition (layout, pose, global color tone).
- Low-noise timesteps mostly control details (faces, hands, textures).
This is why timestep settings and bias can noticeably change whether a LoRA feels "global style" vs "identity/detail".
4. Where to train Z‑Image: local vs cloud AI Toolkit
4.1 Local AI Toolkit
The AI Toolkit by Ostris is open source on GitHub. It supports Z‑Image, FLUX, Wan, Qwen and more through a unified UI and config system.
Local makes sense if:
- You already have an NVIDIA GPU and don’t mind Python / Git setup.
- You want full control over files, logs and custom changes.
Repo: ostris/ai-toolkit
4.2 RunComfy Cloud AI Toolkit
If you’d rather skip CUDA installs and driver issues, use RunComfy Cloud AI Toolkit:
- Zero setup — open a browser and train.
- Consistent VRAM — easier to follow guides without hardware friction.
- Persistent storage — easier iteration and checkpoint management.
👉 Open it here: Cloud AI Toolkit on RunComfy
5. Designing datasets for Z‑Image LoRA training
5.1 How many images do you actually need?
- 10–30 images is a good range for most character or style LoRAs.
- Above ~50 images you often hit diminishing returns unless your style range is very wide.
Z‑Image learns strongly from gradients ("learns hot"), so dataset quality and variety matter more than raw image count:
- Too few images + too much training often shows up as overfit faces, repeated poses, or messy backgrounds.
- A small but diverse dataset (angles, lighting, backgrounds) tends to generalize better than a large repetitive one.
5.2 Character vs style LoRAs
Character LoRA
- Aim for 12–30 images of the same subject.
- Mix close‑ups and full‑body, angles, lighting, outfits.
- Captions can be literal and consistent; optional trigger token.
Style LoRA
- Aim for 15–40 images across varied subjects (people, interiors, landscapes, objects).
- Caption the scene normally; don’t over-describe the style unless you want it to be trigger-only.
- This teaches: "render anything in this style," rather than "only do the style when I say a special keyword."
5.3 Captions, trigger word and text files
image_01.png→image_01.txt- If there is no
.txt, AI Toolkit uses Default Caption. - You can use
[trigger]in captions and set Trigger Word in the JOB panel. - This is especially useful if you later enable DOP (Differential Output Preservation) to make the LoRA more "opt-in".
6. Z‑Image LoRA configuration in AI Toolkit – parameter by parameter
In this section we walk through the UI panels and explain what each important field does.
6.1 JOB panel
- Training Name — descriptive label like
zimage_char_redhair_v1 - GPU ID — local GPU selector; on cloud keep default
- Trigger Word (optional) —
zchar_redhair/zstyle_pencil
6.2 MODEL panel (most important)
This is where the two base choices matter:
If you pick Turbo + adapter
- Model Architecture —
Z‑Image Turbo (w/ Training Adapter) - Name or Path —
Tongyi-MAI/Z-Image-Turbo - Training Adapter Path — keep default or choose:
- v1:
ostris/zimage_turbo_training_adapter/zimage_turbo_training_adapter_v1.safetensors - v2:
ostris/zimage_turbo_training_adapter/zimage_turbo_training_adapter_v2.safetensors
Tip: if you accidentally train Turbo without the adapter, the most common symptom is that your LoRA "works" only when you raise steps/CFG, which defeats the point of Turbo.
If you pick De‑Turbo
- Model Architecture —
Z‑Image De‑Turbo (De‑Distilled) - Name or Path —
ostris/Z-Image-De-Turbo - Training Adapter Path — none
Options:
- Low VRAM / Layer Offloading — enable if you’re VRAM constrained
6.3 QUANTIZATION panel
- On 24+ GB, prefer
BF16/nonefor fidelity - On 16 GB,
float8is usually the best trade-off
6.4 TARGET panel – LoRA configuration
- Target Type —
LoRA - Linear Rank — start with
8–16 16for stronger styles/textures8for smaller, subtler LoRAs
6.5 SAVE panel
- Data Type —
BF16 - Save Every —
250 - Max Step Saves to Keep —
4–12
6.6 TRAINING panel – core hyperparameters
- Batch Size —
1 - Optimizer —
AdamW8Bit - Learning Rate — start at
0.0001If unstable/noisy, drop to
0.00005–0.00008.Avoid pushing too high (e.g.
0.0002+) — Turbo-style models can become unstable quickly. - Weight Decay —
0.0001 - Steps —
2500–3000for 10–30 imagesIf your dataset is very small (<10 images), consider
1500–2200to reduce overfitting. - Loss Type —
Mean Squared Error - Timestep Type —
Weighted - Timestep Bias —
Balanced - Favor High Noise if you want stronger global style / mood.
- Favor Low Noise if you’re chasing identity/detail (advanced; start with Balanced).
- EMA — OFF
Text Encoder:
- Cache Text Embeddings — ON if captions are static and VRAM is tight
(then set Caption Dropout to 0)
- Unload TE — keep OFF for caption-driven training
Regularization:
- DOP — keep OFF for first run; add later for production trigger-only LoRAs
(DOP is powerful but adds complexity; it’s easiest once you already have a stable baseline.)
6.7 DATASETS panel
- Caption Dropout Rate
0.05if not caching text embeddings0if caching embeddings- Cache Latents — ON
- Resolutions —
512 / 768 / 1024is a strong baseline
6.8 SAMPLE panel (match your base!)
If training Turbo:
1024×1024, 8 steps, guidance = 0, sample every250
If training De‑Turbo:
1024×1024, 20–30 steps, CFG 2–3, sample every250
Use 5–10 prompts that reflect real usage; include a couple prompts without the trigger to detect leakage.
6.9 ADVANCED panel – Differential Guidance (optional)
- Do Differential Guidance — ON if you want faster convergence
- Scale — start at
3If samples look overly sharp/noisy early, reduce to
2. If learning is slow, you can test4later.
7. Practical recipes for Z‑Image LoRA training
A strong baseline for Turbo LoRAs:
- Turbo + training adapter (v1 or v2)
rank=16,lr=1e-4,steps=2500–3000512/768/1024buckets, cache latents ON- samples every 250 steps, 8 steps, guidance 0
If your LoRA feels "too strong":
- Keep training the same, but plan to run inference at a lower LoRA weight (e.g.
0.6–0.8).
8. Troubleshooting
"My LoRA destroyed Turbo—now I need more steps / CFG."
- Most common causes:
- trained on Turbo without the training adapter, or
- LR too high for too long.
- Fix:
- use Turbo + training adapter architecture
- keep LR ≤ 1e‑4
- reduce steps if you see drift early
"The style is too strong."
- Lower LoRA weight at inference (0.6–0.8)
- Use trigger + DOP for production LoRAs (opt‑in behavior)
"Hands/backgrounds are messy."
- Add a few images that include those cases
- Consider slightly favoring low-noise timesteps (advanced)
"Out of VRAM / too slow."
- Disable high buckets (keep 512–1024)
- Enable Low VRAM + offloading
- Quantize to float8
- Cache latents (and optionally cache text embeddings)
9. Use your Z‑Image LoRA
- Model playground — try your LoRA on the base model via the Z‑Image Turbo LoRA playground
- ComfyUI workflows — load your LoRA into a workflow like Z‑Image workflow in ComfyUI
FAQ
Should I use the Z‑Image training adapter v1 or v2?
Start with your UI default. If results are unstable or you see Z‑Image Turbo drift, test the other version with all other settings held constant.
Should I train Z‑Image on Turbo+adapter or De‑Turbo?
Turbo+adapter for most Z‑Image LoRAs that must keep 8‑step Turbo behavior. De‑Turbo if you want adapter‑free training or longer fine‑tunes.
What Z‑Image inference settings should I use after training?
Z‑Image Turbo typically uses low/no CFG and ~8 steps. De‑Turbo behaves more like a normal model (20–30 steps, low CFG). Always match your sampling settings to the base you’re actually using.
More AI Toolkit LoRA training guides
Ready to start training?

