Ostris AI Toolkit LoRA Training for Diffusion Model Fine-Tuning

This page is the overview of LoRA fine-tuning with the Ostris AI Toolkit. For a model-specific recipe, jump to one of these guides:

By the end of this guide, you should:

Understand the core ideas behind LoRA training (what’s really happening when you fine‑tune a model).
Know how the AI Toolkit is organized and what each panel controls.
Understand what key parameters (learning rate, rank, steps, noise schedule, DOP, etc.) do so you can tune them deliberately.
Be able to train LoRAs either on your own machine or with the RunComfy Cloud AI Toolkit, and then reuse those LoRAs in your normal generation workflows.

1. What is Ostris AI Toolkit? (LoRA trainer for diffusion models)
2. Supported models in Ostris AI Toolkit (Flux, Wan, Z‑Image, Qwen‑Image, SDXL)
3. Installing Ostris AI Toolkit locally and on RunComfy Cloud AI Toolkit
4. Ostris AI Toolkit Web UI overview (Dashboard, Datasets, New LoRA Job)
5. LoRA training basics and core hyperparameters for AI Toolkit
6. Mapping LoRA concepts to AI Toolkit parameters
7. Step‑by‑step example: training a LoRA with Ostris AI Toolkit
8. Troubleshooting AI Toolkit LoRA training: common errors and fixes

1. What is Ostris AI Toolkit? (LoRA trainer for diffusion models)

Ostris AI Toolkit is a training suite focused on diffusion models for images and video. It does not handle language or audio models; everything it supports is either a classic DDPM‑style diffusion model (such as SD 1.5 or SDXL) or a modern diffusion‑transformer model such as Flux, Wan, Qwen‑Image, Z‑Image or OmniGen2. It is built around LoRA‑style adapters: in practice, when you fine‑tune a model with AI Toolkit you are not retraining the entire network, you are training small LoRA (or similar lightweight adapters) on top of a frozen base model.

Key features of Ostris AI Toolkit for LoRA training

AI Toolkit provides a common training engine and configuration system for all supported model families. Each model (Flux, Z‑Image Turbo, Wan 2.2, Qwen‑Image, SDXL, etc.) has its own preset, but they all plug into the same structure: model loading, quantization settings, LoRA/LoKr adapter definition, training hyper‑parameters, dataset handling and sampling rules. That’s why the Web UI looks familiar whether you are training an AI Toolkit Flux LoRA, a Z‑Image Turbo LoRA or a Wan 2.2 video LoRA.

On top of this engine, AI Toolkit ships with both a CLI and a full Web UI. The CLI runs jobs directly from YAML configs; the Web UI is a graphical layer over those configs. In the UI, "AI Toolkit" usually means the New Job screen where you pick a model family, choose a LoRA type and rank, set learning rate and steps, attach one or more datasets and define how often to generate sample images or videos. You get dedicated panels for Job, Model, Quantization, Target, Training, Regularization, Datasets and Sample, so you rarely need to touch raw YAML unless you want to. Whether you run it locally or via a cloud setup such as the RunComfy Cloud AI Toolkit, this workflow is the same.

Built‑in LoRA training tools in Ostris AI Toolkit

AI Toolkit bakes in a number of "batteries‑included" features that you would otherwise need to script or glue together by hand:

Quantization and low‑VRAM modes – configurable 8‑bit / 6‑bit / 4‑bit (and 3‑bit with recovery adapters) transformer quantization plus layer offloading, so large models like Flux or Wan can be trained on 24–48 GB GPUs with controllable quality/speed trade‑offs.
LoRA / LoKr adapters – support for standard LoRA as well as LoKr (a more compact but less universally supported variant), selectable via Target Type so you can choose between maximum compatibility and smaller, higher‑capacity adapters.
Differential Output Preservation (DOP) – a regularization loss that compares base‑model vs LoRA outputs on "regularization" images and penalizes unwanted changes, helping to reduce LoRA "bleeding" where every output starts to look like your subject.
Differential Guidance for turbo‑style models – an optional training‑time guidance term (used heavily for Z‑Image Turbo) that focuses the update on "what should change" relative to the base model, improving adaptation on few‑step / turbo models without destroying their speed benefits.
Multi‑stage noise training – separate high‑noise and low‑noise training stages so you can balance coarse structure learning (composition, pose) with fine detail sharpening (textures, edges).
Latent and text‑embedding caching – Cache Latents and Cache Text Embeddings trade disk space for speed and lower VRAM, which is particularly helpful on smaller GPUs or in cloud sessions where you want to iterate quickly.
EMA (Exponential Moving Average) – an optional smoothed copy of the LoRA weights that can make convergence more stable, especially on small datasets.

The Web UI exposes all of these features through clear controls, and because the layout is consistent across models, once you understand how AI Toolkit trains a LoRA for one base (for example, Flux), it is straightforward to apply the same reasoning to Z‑Image Turbo, Wan, Qwen‑Image and other supported diffusion models.

2. Supported models in Ostris AI Toolkit (Flux, Wan, Z‑Image, Qwen‑Image, SDXL)

The AI Toolkit currently supports the following model families:

IMAGE models – single images (Flux, Z‑Image Turbo, Qwen‑Image, SD, etc.).
INSTRUCTION / EDIT models – image editing / instruction following models (Qwen‑Image‑Edit, Flux Kontext, HiDream E1).
VIDEO models – text‑to‑video and image‑to‑video (Wan 2.x series).

Category	Model family in AI Toolkit UI	Typical purpose
IMAGE	FLUX.1 / FLUX.2	Flagship FlowMatch image models; high‑quality style/character LoRAs at 1024+ resolution.
INSTRUCTION	FLUX.1‑Kontext‑dev	Paired/conditional image training (before/after, 360°, multi‑view, turnarounds).
IMAGE	Qwen‑Image	Strong bilingual text‑to‑image model; LoRAs for style/character control
INSTRUCTION	Qwen‑Image‑Edit, Qwen‑Image‑Edit‑2509	Image editing / instruction‑following models; LoRAs for specific edit styles or effects
IMAGE	Z‑Image Turbo (w/ Training Adapter)	Distilled image model with a dedicated training adapter for LoRA fine‑tuning.
VIDEO	Wan 2.2 (14B)	Newer Wan video base; high‑quality text‑to‑video / image‑to‑video generation.
VIDEO	Wan 2.2 T2V (14B)	Wan 2.2 text‑to‑video base for cinematic, prompt‑driven video LoRAs.
VIDEO	Wan 2.2 I2V (14B)	Wan 2.2 image‑to‑video model for animating stills into motion.
VIDEO	Wan 2.2 T12V (5B)	Efficient Wan 2.2 hybrid model; lighter 5B version for text‑ and image‑to‑video.
VIDEO	Wan 2.1 (1.3B / 14B)	Earlier Wan video models; smaller and larger variants for T2V.
VIDEO	Wan 2.1 I2V (14B‑480P / 14B‑720P)	Wan 2.1 image‑to‑video at different base resolutions.
IMAGE	SD 1.5, SDXL	"Classic" Stable Diffusion models; backward‑compatible LoRAs and legacy pipelines.
IMAGE	OmniGen2	All‑round modern image base; general‑purpose LoRAs.
IMAGE	Chroma	High‑quality image model for cinematic / photoreal styles.
IMAGE	Lumina2	Modern image model; good for general LoRA training.
IMAGE	HiDream	Image generation model related to HiDream video; style and character LoRAs.
INSTRUCTION	HiDream E1	Instruction‑style / frame‑conditioned image or video training.
IMAGE	Flex.1 / Flex.2	Lightweight general‑purpose image models.

More models are sometimes added or revised, and the same Web UI structure applies across them.

3. Installing Ostris AI Toolkit locally and on RunComfy Cloud AI Toolkit

3.1 Install Ostris AI Toolkit locally on Linux and Windows

The official README on GitHub gives straightforward installation instructions for Linux and Windows.

On Linux:


git clone https://github.com/ostris/ai-toolkit.git
cd ai-toolkit

python3 -m venv venv
source venv/bin/activate

# install PyTorch with CUDA (adjust version if needed)
pip3 install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 \
  --index-url https://download.pytorch.org/whl/cu126

pip3 install -r requirements.txt

On Windows, you can either follow the same pattern with python -m venv venv and .\venv\Scripts\activate, or use the community AI‑Toolkit Easy Install batch script, which wraps the whole process into a single click and automatically opens the UI in your browser for the latest version).

To start the Web UI once dependencies are installed:


cd ui
npm run build_and_start

The interface will be available at http://localhost:8675. If you run it on a remote machine, set AI_TOOLKIT_AUTH to a password first so only you can access the UI (see the AI Toolkit GitHub repository for security notes).

3.2 Use RunComfy Cloud AI Toolkit for LoRA training (no local setup)

If you don’t want to deal with GPU drivers, CUDA, or local installs at all, you can use the RunComfy Cloud AI Toolkit. In this mode:

AI Toolkit runs entirely in the cloud – you just open a browser and you’re in the UI.
You have access to powerful GPUs (80 GB and 141 GB VRAM), ideal for heavy FLUX, Qwen‑Image, Z‑Image Turbo, or Wan LoRA training.
Your datasets, configs, checkpoints, and past jobs live in a persistent workspace tied to your RunComfy account.
Training, playground for model testing, and ComfyUI workflows all live in one place.

Open it directly here: Cloud AI Toolkit on RunComfy

4. Ostris AI Toolkit Web UI overview (Dashboard, Datasets, New LoRA Job)

When you open the Web UI (local or on RunComfy), the left sidebar has a small but important set of pages:

4.1 Dashboard and Training Queue

The Dashboard shows active and recent jobs at a glance. It’s mainly a quick status page.

The Training Queue page is where you:

see each job’s state (queued, running, finished, failed),
open logs to debug issues,
stop or delete jobs,
download output checkpoints and sample images.

Think of it as the "job control center". Every LoRA you train will show up here.

4.2 Dataset manager

The Datasets page lets you define named datasets that you can attach to jobs:

You select or upload image folders or video clips.
The UI scans them and shows resolutions, counts, and how many captions / metadata entries exist.
Each dataset gets an internal name that later appears in the job’s Target Dataset dropdown.

This is where you create:

main training datasets (your character, style, product shots),
optional regularization datasets (other people, other trucks, generic backgrounds, etc.) for DOP or classic regularization.

4.3 New Job: the core LoRA configuration screen

The New Job page is the heart of AI Toolkit. A job is essentially:

Train a LoRA of type X on model Y, using dataset Z, with these hyperparameters.

The screen is divided into panels:

JOB – naming and GPU selection.
MODEL – which base model to fine‑tune.
QUANTIZATION – how aggressively the base model is compressed.
TARGET – LoRA vs LoKr and rank.
SAVE – checkpoint precision and frequency.
TRAINING – learning rate, steps, optimizer, timestep schedule.
ADVANCED / Regularization – EMA, Differential Guidance, DOP.
DATASETS – which dataset(s) to train on, and how.
SAMPLE – how often to generate reference images or videos while training.

The rest of this guide is mostly about helping you understand how these panels relate back to the core LoRA concepts.

5. LoRA training basics and core hyperparameters for AI Toolkit

Before touching any AI Toolkit controls, it helps to have a mental model of what LoRA training is doing behind the scenes.

5.1 How LoRA works inside diffusion models

A modern diffusion model is mostly a stack of transformer blocks with large weight matrices. In vanilla fine‑tuning, you would update all these weights directly, which is expensive and easy to overfit.

In all supported models (such as Flux, Z‑Image Turbo, Wan, Qwen‑Image), the backbone is a large diffusion transformer. LoRA does not replace the original weight matrix W; instead, it adds a small low‑rank update built from two learned matrices A and B. You can think of it as: W_new = W + alpha A B, where W is the frozen original weight matrix, A and B are small trainable matrices, and alpha is a scaling factor that controls how strong the LoRA update is at inference time.

The rank determines the width of matrices A and B, and therefore how complex the LoRA update can be. A higher rank makes the LoRA more expressive but also heavier in terms of parameters and compute. A lower rank gives you a smaller, more focused adapter that is lighter and generally harder to overfit.

5.2 Key LoRA hyperparameters explained

These names appear in every trainer; AI Toolkit just exposes them clearly.

Learning Rate (Learning Rate)

Controls how large a step we take in parameter space each time the optimizer updates the LoRA.
Too low: training is slow and might not fit your dataset well.
Too high: the loss bounces or explodes, and the LoRA becomes noisy, unstable, or wildly overfitted.

For diffusion LoRAs, 0.0001 is a very sensible default. Many published Wan and Flux configs fall in the 0.0001 – 0.0002 range.

Batch Size and Gradient Accumulation

Batch Size is how many images/clips the model sees in parallel for each gradient computation.
Gradient Accumulation means "keep accumulating gradients for N batches before actually applying an update", which simulates a larger batch without needing more VRAM.

Effective batch size is: Batch Size × Gradient Accumulation

Higher effective batch gives smoother gradients and better generalization, but costs more compute. Many people run with Batch Size = 1 and Gradient Accumulation = 2–4 on 24 GB GPUs.

Steps (Steps)

This is how many optimizer updates you will do. It’s the main knob for "how long do we train".

Too few steps → underfitting: the LoRA barely changes the base model.
Too many steps → overfitting: the LoRA memorizes training images and bleeds into everything.

The right number depends on: dataset size, image/video variety, rank, learning rate.

For typical 20–50 image character LoRAs on modern models, 2 000–3 000 steps is a good starting range.

Rank (Linear Rank)

Rank determines how many degrees of freedom your LoRA has.
Doubling the rank roughly doubles the LoRA’s capacity and parameter count.

Practical intuition:

Rank 16–32 is enough for most characters and styles on large models like Flux or Wan.
Higher ranks make it easier to overfit small datasets; lower ranks force the LoRA to generalize.

Weight Decay (Weight Decay)

Weight decay is a standard regularization trick: it gently pulls weights toward zero at each step.

It reduces the chance that the LoRA will "snap" to extreme values that perfectly recreate training images but don’t generalize.
Values like 0.0001 are common and usually safe. You rarely need to touch it until you see obvious overfitting.

Timestep schedule

Diffusion models learn to denoise across a range of noise levels. You choose which timesteps to sample more often:

High noise: model learns coarse structure, composition, big shapes.
Low noise: model learns fine textures and details.
Mid noise: where structure and detail meet; great for faces and characters.

The Timestep Type and Timestep Bias parameters in AI Toolkit are just UI handles for this scheduling, which we’ll unpack in the parameter section.

Dataset composition and captions

Even with perfect hyperparameters, bad data gives a bad LoRA:

Use clean, varied images that all match the concept (same person, same brand, same style) but with different poses, lighting, and backgrounds.
Captions should clearly tie a unique trigger word to the concept so you can activate the LoRA later without breaking the base model’s vocabulary.

On video LoRAs (Wan, HiDream E1), you have the same logic but with short clips instead of individual images, and frame sampling becomes part of the dataset design.

6. Mapping LoRA concepts to AI Toolkit parameters

Now we’ll walk through the New Job screen panel by panel and connect each parameter to the concepts above.

6.1 JOB panel: project, GPU, and trigger word

The JOB panel is simple but important:

Training Name - This is just the job’s label and becomes part of the output folder and file names. Many people include both the model and trigger word, e.g. flux_dev_skschar_v1.

GPU ID - On a local install this selects your physical GPU. On the cloud AI Toolkit on RunComfy, leave this at the default; the actual GPU type (H100 / H200, etc.) is chosen later when you start the job from the Training Queue.

Trigger Word - If you put a word here, AI Toolkit will prepend it to all captions in your dataset at training time (without permanently editing your files). This is handy if your captions don’t already have a consistent trigger. Use a nonsense token that the base model doesn’t already know (e.g. sks_char_neo), so the LoRA doesn’t compete with existing meanings.

6.2 MODEL panel: choosing and loading the base model

Model Architecture is where you pick from the model list (Flux, Z‑Image Turbo, Wan 2.2, Qwen‑Image, etc.). When you choose one:

AI Toolkit loads a preset configuration tailored to that model: sampling type, noise schedule defaults, sometimes adapter paths.

Name or Path lets you override the default Hugging Face / model hub path:

Leave it blank or default → AI Toolkit downloads the default base model.
Point it to a local path → AI Toolkit uses your custom checkpoint (e.g. a Flux finetune you like).

If the model is gated (Flux.1‑dev, Flux.2‑dev, some Wan variants, etc.), you must accept the license and set HF_TOKEN in a .env file so AI Toolkit can download it.

Depending on the model, you’ll also see extra flags like Low VRAM or Layer Offloading here or in closely related panels:

Low VRAM compresses and offloads parts of the model so it fits on smaller GPUs, at the cost of speed.
Layer Offloading aggressively shuffles parts of the model between CPU and GPU; only use it if standard Low VRAM isn’t enough, as it can be slower and occasionally less stable.

These switches don’t change what the LoRA learns; they just change how AI Toolkit packs the base model into memory, mainly trading speed and stability for the ability to fit the model on your hardware.

6.3 QUANTIZATION panel: precision vs VRAM

The QUANTIZATION panel usually has:

Transformer (e.g. float8, 6-bit, 4-bit, 3-bit ARA),
Text Encoder (typically float8 (default)).

What they mean:

The transformer is the big, heavy part of the model that processes image latents and cross‑attention with text.
The text encoder turns prompts into token embeddings.

Quantizing the transformer:

float8 is the safest and most precise; it uses more VRAM but has minimal quality loss.
6-bit is a strong compromise for 24 GB GPUs; small quality hit for decent savings.
4-bit and 3-bit ARA are more aggressive; 3-bit ARA combines 3‑bit weights with an accuracy recovery adapter that partially restores precision.

Quantizing the text encoder:

Text encoders are much smaller, so they’re usually kept at float8.
Some advanced setups freeze or unload the text encoder entirely (see Unload TE and Cache Text Embeddings later); in that case, its quantization matters less.

Practically:

On a 24 GB GPU fine‑tuning Flux or Wan, Transformer = 6-bit, Text Encoder = float8 is a very workable starting point.
If you have 48 GB+, stick to float8 everywhere unless you need the extra memory for very high resolutions or video frame counts.

6.4 TARGET panel: LoRA type and rank

The TARGET panel describes the adapter you’re training:

Target Type - Usually LoRA. Some builds also show LoKr (Low‑Rank Kronecker), a slightly different scheme that can be more parameter‑efficient but is not universally supported by every inference tool. For maximum compatibility—especially if you plan to use your LoRA in many different ComfyUI or Automatic1111 setups—LoRA is the safe default**.
Linear Rank - This is the LoRA rank we discussed earlier: higher rank means more capacity, a larger LoRA file, more VRAM usage, and a higher risk of overfitting on small datasets. Intuition for modern diffusion transformers (Flux, Z‑Image Turbo, Wan 2.x, Qwen‑Image, OmniGen2, Chroma, Lumina2, etc.):

8–16: compact and generalizing. This is a good starting range for strong bases like Z‑Image Turbo and many SDXL / SD 1.5 setups, especially when your dataset is small (5–40 images or a few short clips).
16–32: typical range for larger‑capacity style/character LoRAs on models like Flux, Wan 2.x, Qwen and other big image/video backbones. In practice you usually start at 16 and only push to 32 if you have enough data and the LoRA still feels too weak.
64+: rarely necessary. Only consider ranks this high if you have a large, diverse dataset and you intentionally want a very strong style or domain shift and have plenty of VRAM; most published AI Toolkit recipes never need to go this high.

On SD 1.5 / SDXL you might also see a Conv Rank (convolution rank), which focuses more on texture and style layers. Higher Conv Rank emphasizes how the image is rendered (brush strokes, noise pattern), while Linear Rank leans more on what is in the image.

6.5 SAVE panel: checkpoint precision and save frequency

SAVE controls how your LoRA checkpoints are written:

Data Type

BF16 (bfloat16) is a great default: numerically stable and efficient.
FP16 is slightly more precise but not noticeably different for typical LoRAs.
FP32 is very precise and very heavy; use only if you know you need it.

Save Every
The number of steps between checkpoints. If you set Save Every = 250 and Steps = 3000, you’ll potentially get 12 checkpoints (but see the next field). You’ll usually want Save Every to **match Sample Every in the SAMPLE panel so that each checkpoint has matching previews.
Max Step Saves to Keep
How many of those checkpoints to keep on disk. If this is 4, only the 4 most recent ones are preserved; older ones are deleted to save space.

6.6 TRAINING panel: optimizer, steps, and noise schedule

Batch Size and Gradient Accumulation

As mentioned earlier:

Batch Size = images/clips per forward pass.
Gradient Accumulation = how many such passes you stack before one optimizer update.

If VRAM is tight, you might do:

Batch Size = 1, Gradient Accumulation = 4 → behaves like batch size 4 but takes four times as many passes.

Always ensure your effective batch size is no larger than your dataset size; you never want to ask for 16 images per step when you only have 10 total.

Steps

This is total optimizer steps, not "epochs".

2000–3000 steps for Flux / Qwen / Z‑Image Turbo / OmniGen2 / Chroma (and many Wan 2.x LoRAs) is a common baseline for 20–50 image or small‑clip datasets.

It’s often better to train a bit less and keep a mid‑run checkpoint than to push to absurd step counts and hope the last one is best.

Optimizer (Optimizer)

You’ll typically see:

AdamW8Bit – AdamW with 8‑bit optimizer states. This saves memory and works very well for small‑to‑medium datasets.
Adafactor – more memory‑efficient, scales to massive datasets, but can be trickier to tune.

For most LoRAs in AI Toolkit, AdamW8Bit is the right choice** unless you’re hitting optimizer‑state OOM errors.

Learning Rate

A good default is 0.0001. If:

the LoRA barely seems to learn, you can try 0.00015–0.0002,
you see rapid overfitting or noisy samples, try 0.00005–0.00008.

Avoid jumping straight to high rates like 0.0005 unless a model‑specific guide tells you to (e.g. some experimental Turbo configs).

Weight Decay

As described before, 0.0001 is a nice "gentle regularization" default. If your LoRA is clearly memorizing training images even at modest steps, nudging this higher is one of the tools you have.

Timestep Type and Timestep Bias

These two parameters shape which diffusion timesteps your training batches focus on.

Timestep Type can be:

Linear – sample timesteps evenly across the whole noise range.
Sigmoid – concentrate on mid‑range timesteps (good for faces/characters).
Weighted or other presets – model‑specific schedules.

Timestep Bias can be:

Balanced – no extra bias; matches the Timestep Type distribution.
High Noise – skew toward early timesteps (very noisy latents); emphasizes global structure and composition.
Low Noise – skew toward later timesteps (almost clean images); emphasizes fine textures.

For character LoRAs on FlowMatch models, Weighted + Balanced is a very solid starting point: the LoRA learns the concept where the model is "halfway" through denoising, which tends to match what you see at inference.

Sampler / Noise Type in training

On older SD models, AI Toolkit uses DDPM‑style samplers; for FlowMatch models like Flux, Z‑Image Turbo, Wan 2.x, it uses FlowMatch samplers by default. You normally don’t need to change this—the model preset sets the appropriate Timestep Type and sampler internally.

EMA (Exponential Moving Average)

Use EMA toggles whether AI Toolkit keeps a smoothed copy of the LoRA weights over time.
If enabled, EMA Decay (e.g. 0.99) controls how quickly the EMA forgets old updates:

0.9 = reacts quickly, less smooth.
0.99 = smoother.
0.999+ = very smooth but slow to adapt.

EMA can improve stability on small datasets but consumes extra memory. On tight VRAM budgets, it’s reasonable to keep Use EMA off unless a specific guide recommends it.

Text Encoder Optimizations

Unload TE – unloads the text encoder from VRAM between steps. Saves memory but forces frequent re‑loading from disk, which can be slow on HDDs.
Cache Text Embeddings – runs the text encoder once per caption, then stores the embeddings; later steps reuse those embeddings without re‑running the encoder. This trades disk space for speed/VRAM.

For most workflows:

If you have enough VRAM: leave both off.
If you’re tight on VRAM but have fast SSD storage and your captions are effectively static (no Differential Output Preservation, no on‑the‑fly [trigger] rewriting, no heavy caption dropout that depends on per‑step text changes), turn on Cache Text Embeddings so AI Toolkit can encode each caption once and free the text encoder.
If you are using features that modify prompts each step — for example Differential Output Preservation (DOP), dynamic trigger substitution in captions, or any setup that relies on per‑step caption dropout behaviour — keep Cache Text Embeddings = OFF** even when VRAM is tight, so the text encoder can re‑encode the real prompt every batch.
Only use Unload TE when absolutely necessary (for very narrow trigger‑only LoRAs where dataset captions are ignored), since it completely disables caption‑based training.

6.7 ADVANCED / Regularization panel: DOP and Differential Guidance

Differential Output Preservation (Differential Output Preservation)

When you toggle this on, you’re asking AI Toolkit to:

Run both the base model and the LoRA‑augmented model on a set of "regularization" images.
Add a loss term that penalizes the LoRA for changing outputs that should remain unchanged.

Controls:

DOP Loss Multiplier – how strong this preservation loss is; 0.1–1.0 is typical. Think of 1.0 as "take this preservation very seriously".
DOP Preservation Class – a text label describing what you’re trying to protect, like "person" or "truck". This helps the text encoder understand the regularization captions.

To use DOP effectively you must:

Have at least one dataset marked as Is Regularization in the DATASETS panel.
Caption those images without your LoRA trigger word (these are "generic" examples).

Good scenarios for DOP:

Your character LoRA makes every person look like your subject.
Your product LoRA turns all logos into your brand, even when you don’t use the trigger word.

Blank Prompt Preservation is a variant where the regularization runs with empty prompts, encouraging the LoRA not to disturb basic "unprompted" behavior.

Do Differential Guidance (Do Differential Guidance)

Primarily used for Z‑Image Turbo LoRAs:

AI Toolkit compares base and adapated outputs and uses a difference signal to sharpen what the LoRA should change.
Differential Guidance Scale controls how strongly this difference influences the training updates; the Hugging Face Z‑Image Turbo LoRA guide uses example values that work well in practice.

Enabling Differential Guidance:

Helps Z‑Image Turbo LoRAs adapt deeply despite the underlying few‑step distillation.
Works best when combined with cached text embeddings and carefully tuned learning rates and steps.

For non‑turbo models (Flux, Qwen, SDXL), you usually leave Do Differential Guidance off unless a model‑specific tutorial says otherwise.

6.8 DATASETS panel: what you actually train on

Each dataset block in the DATASETS panel corresponds to one dataset from the Datasets page.

Key fields:

Target Dataset – which dataset this block refers to.
LoRA Weight – relative importance of this dataset compared to others in the same job.
Default Caption – fallback caption applied when an image has no caption file.
Caption Dropout Rate
Num Frames (for video models)
Cache Latents
Is Regularization
Flip X, Flip Y
Resolutions (256–1536 buckets)

What they mean in practice:

Combining datasets with LoRA Weight
If you have multiple datasets (e.g. "character close‑ups" and "full‑body shots"), you can balance them by giving one a higher LoRA Weight. A dataset with weight 2 will be sampled roughly twice as often as one with weight 1.
Default Caption and Caption Dropout Rate

Default Caption is useful if you forgot to caption some images and want to give them at least a minimal description (including the trigger word).
Caption Dropout Rate randomly removes or blanks captions for some training examples:

Near 0 → the LoRA learns a strong dependency on the caption.
Near 1 → the LoRA behaves more like a "style always on" modifier.

Is Regularization
Mark this when the dataset should be used for DOP / regularization, not as main training data. These images should not contain your trigger word and usually cover generic examples (other people, trucks, etc.).
Cache Latents
When enabled, AI Toolkit pre‑computes latent encodings of your images and saves them, so later training steps don’t have to re‑encode each image. Training speeds up, but your disk usage jumps: hundreds or thousands of images at high resolution can consume tens of gigabytes. You’ll need to manually clean these latents if you don’t want them persisting forever.
Num Frames (video only)
For Wan/HiDream LoRAs, this decides how many frames are sampled from each clip during training. More frames → better motion learning but higher VRAM; presets generally choose sensible defaults per model.
Flip X and Flip Y
Automatic data augmentation:

Flip X (horizontal flip) doubles your dataset but mirrors everything, including asymmetrical features and text.
Flip Y (vertical flip) rarely makes sense for realistic images.

Resolutions
These define which image sizes AI Toolkit will "bucket" your images into. It only shrinks images to fit the nearest bucket; it never upscales. If you enable, say, 768 and 1024:

900×900 images → shrunk to 768×768.
1200×1200 images → shrunk to 1024×1024.

6.9 SAMPLE panel: seeing your LoRA learn in real time

The SAMPLE panel defines how AI Toolkit generates preview images or videos during training.

Top‑level fields:

Sample Every – how many steps between previews.
Sampler – FlowMatch or DDPM, depending on model.
Width / Height – preview resolution.
Seed and Walk Seed.
Guidance Scale.
Num Frames and FPS (for video previews).
Sample Steps.
Advanced toggles: Skip First Sample, Force First Sample, Disable Sampling.

Below that, you can add multiple Sample Prompts, each with its own prompt text, optional per‑prompt resolution/seed, LoRA Scale, and an optional control image.

How this ties back to training:

Sample Every vs Save Every: It’s best if these two match so that every saved checkpoint has a corresponding set of preview images. If you change one, change the other.
Sampler: Stick to the sampler recommended by the model preset:

FlowMatch for Flux, Z‑Image Turbo, Wan, OmniGen2, etc.
DDPM for SD 1.5 / SDXL.

Preview resolution and steps

1024×1024 with Sample Steps = 20–25 gives clear previews without being too slow for most image models.
For video, higher Num Frames and FPS produce more realistic previews but are heavy; presets are usually tuned per model.

Seeds and Walk Seed

A fixed Seed with Walk Seed off means every checkpoint uses exactly the same random noise, so you can directly compare how the LoRA’s outputs evolve.
Enabling Walk Seed increments the seed per prompt, adding variety. Nice for browsing, but slightly harder to compare step‑by‑step.

In practice, many users:

keep Sample Every = Save Every = 250,
set 3–6 sample prompts covering typical use cases,
keep at least one prompt that is identical across all checkpoints so they can visually track convergence.

7. Step‑by‑step example: training a LoRA with Ostris AI Toolkit

To make this concrete, here is an end‑to‑end example you can adapt to any supported image model (Flux, Omnigen2, Z‑Image Turbo, Qwen‑Image, etc.). I’ll keep numbers in safe ranges rather than hyper‑optimized for any one model.

Step 1 – Prepare your dataset

Collect 25–40 high‑quality images of your concept (a person, a product, a style).
Resize or crop them so the main subject is visible and not tiny in the frame.
Caption each image with:

a unique trigger word (e.g. sks_char_neo),
a concise description: "portrait photo of sks_char_neo, studio lighting, 35mm lens".

Step 2 – Create a dataset in AI Toolkit

Go to Datasets → New Dataset in the UI.
Upload your images (and caption files or JSONL if you have them).
Confirm that the dataset shows the correct number of images and a reasonable resolution distribution (most near 768–1024 on modern models).

Optionally:

Create a second dataset of generic people or objects (similar class but not your subject) if you think you’ll need DOP later; leave Is Regularization off for now—you can enable it when you decide to use it.

Step 3 – Configure a new LoRA job

On the New Job page:

JOB

Training Name: flux_sks_char_neo_v1 (or similar).
GPU ID: leave at defalult unless you know you need another.
Trigger Word: sks_char_neo (only if your captions don’t already include it).

MODEL

Model Architecture: your chosen base (e.g. FLUX.1, Z‑Image Turbo, Qwen‑Image).
Name or Path: leave default unless you have a specific checkpoint.
Enable Low VRAM only if VRAM is tight.

QUANTIZATION

Transformer: 6-bit on 24 GB GPUs, float8 if you have headroom.
Text Encoder: float8 (default).

TARGET

Target Type: LoRA.
Linear Rank: 32 for most models; 16 if VRAM is tight or the base is extremely strong.

SAVE

Data Type: BF16.
Save Every: 250.
Max Step Saves to Keep: 4.

TRAINING

Batch Size: 1.
Gradient Accumulation: 4.
Steps: 3000.
Optimizer: AdamW8Bit.
Learning Rate: 0.0001.
Weight Decay: 0.0001.
Timestep Type: Sigmoid or the model’s recommended default.
Timestep Bias: Balanced.
Use EMA: off unless you have plenty of memory.

ADVANCED / Regularization

Leave Differential Output Preservation and Do Differential Guidance off for your first run unless your model requires it (Z‑Image Turbo is the main one that benefits from Differential Guidance out of the box).

DATASETS

Target Dataset: your main dataset.
LoRA Weight: 1.
Default Caption: leave empty if all images already have captions.
Caption Dropout Rate: 0.0–0.1 so the LoRA strongly relies on your trigger word.
Cache Latents: optional; turn on if you’re fine with extra disk usage and want faster training.
Is Regularization: off for this main dataset.
Resolutions: enable 768 and 1024 (or as your GPU allows).

SAMPLE

Sample Every: 250 (match Save).
Sampler: use the default (FlowMatch or DDPM depending on model).
Width / Height: 1024×1024.
Seed: any fixed number (42 is fine); set Walk Seed to off if you want directly comparable previews.
Guidance Scale: use the model’s suggested default.
Sample Steps: 20–25.
Add 3–5 Sample Prompts:

Click Create Job. The job appears in Training Queue; open its logs to confirm it starts correctly.

Step 4 – Monitor samples and adjust

Each time you hit a multiple of 250 steps, AI Toolkit will:

save a new checkpoint,
generate sample images for your prompts.

Watch for:

Underfitting – early checkpoints look identical to the base model; the trigger word barely changes anything.
→ Consider increasing Steps slightly (restart training with 4000) or bump Learning Rate a bit (e.g. 0.0001 → 0.00015).
Overfitting / bleeding – outputs become almost photo copies of your training images, or your trigger word starts hijacking generic prompts.
→ Try a lower Linear Rank, fewer Steps, slightly higher Weight Decay, or enable DOP with a carefully prepared regularization dataset.

Once you see a checkpoint that consistently looks good across several prompts, note its step number.

Step 5 – Export and use your LoRA

From the Training Queue or from your AI Toolkit output folder:

Download the best checkpoint (a .safetensors LoRA file).
If you’re using the RunComfy Cloud AI Toolkit, these LoRA files will also be stored on your Custom Models page, so you can copy the model link, download them, and test them in the model playground or ComfyUI.

8. Troubleshooting AI Toolkit LoRA training: common errors and fixes

Dataset not found or empty

Symptoms:

Job exits immediately.
Logs mention "no images found" or similar.

Checks:

In Datasets, confirm the dataset shows the expected image count.
Ensure Target Dataset in the job matches the correct dataset.
If using JSONL metadata, verify the file is present and correctly formatted.

Base model download / Hugging Face errors

Symptoms:

403 / 404 errors when downloading the model.
Log messages about missing access.

Fixes:

Accept the model’s license on Hugging Face if it’s gated (Flux dev, some Wan variants) as described in the.
Add HF_TOKEN=your_read_token to a .env file in the AI Toolkit root.

CUDA out‑of‑memory during training or sampling

Symptoms:

"CUDA out of memory" errors when the job starts, or when generating samples.

Options:

In DATASETS:

Disable high resolutions (1280, 1536) and stick to 768/1024.

In TARGET:

Lower Linear Rank (32 → 16).

In QUANTIZATION / MODEL:

Turn on Low VRAM.
Use a more aggressive transformer quantization (float8 → 6‑bit).

In TRAINING:

Reduce Batch Size or Gradient Accumulation.

In SAMPLE:

Lower preview resolution and Sample Steps,
Reduce Num Frames for video previews.

If you’re running in RunComfy Cloud AI Toolkit, the easy escape hatch is to bump the job to a higher‑VRAM GPU tier and re‑run it, often dropping some of the aggressive quantization / Low VRAM settings and using a simpler, faster config. With more VRAM and fewer memory‑saving hacks, each step runs quicker and you can iterate through more checkpoints instead of spending time micromanaging VRAM.

LoRA overfits and hijacks the base model

Symptoms:

Every person looks like your subject.
All trucks look like your specific product, even without trigger word.

Mitigations:

Lower Linear Rank.
Use an earlier checkpoint (e.g. 2000 steps instead of 3000).
Slightly increase Weight Decay.
Add a regularization dataset of similar‑class examples (Is Regularization = on).
Enable Differential Output Preservation with a reasonable DOP Loss Multiplier (e.g. 0.2–0.5) and a suitable DOP Preservation Class ("person", "truck", etc.).

OstrisAI-Toolkit

New Training Job

Job

Model

Quantization

Target

Save

Training

Advanced

Datasets

Dataset 1

Sample

Table of contents

1. What is Ostris AI Toolkit? (LoRA trainer for diffusion models)

Key features of Ostris AI Toolkit for LoRA training

Built‑in LoRA training tools in Ostris AI Toolkit

2. Supported models in Ostris AI Toolkit (Flux, Wan, Z‑Image, Qwen‑Image, SDXL)

3. Installing Ostris AI Toolkit locally and on RunComfy Cloud AI Toolkit

3.1 Install Ostris AI Toolkit locally on Linux and Windows

3.2 Use RunComfy Cloud AI Toolkit for LoRA training (no local setup)

4. Ostris AI Toolkit Web UI overview (Dashboard, Datasets, New LoRA Job)

4.1 Dashboard and Training Queue

4.2 Dataset manager

4.3 New Job: the core LoRA configuration screen

5. LoRA training basics and core hyperparameters for AI Toolkit

5.1 How LoRA works inside diffusion models

5.2 Key LoRA hyperparameters explained

6. Mapping LoRA concepts to AI Toolkit parameters

6.1 JOB panel: project, GPU, and trigger word

6.2 MODEL panel: choosing and loading the base model

6.3 QUANTIZATION panel: precision vs VRAM

6.4 TARGET panel: LoRA type and rank

6.5 SAVE panel: checkpoint precision and save frequency

6.6 TRAINING panel: optimizer, steps, and noise schedule

6.7 ADVANCED / Regularization panel: DOP and Differential Guidance

6.8 DATASETS panel: what you actually train on

6.9 SAMPLE panel: seeing your LoRA learn in real time

7. Step‑by‑step example: training a LoRA with Ostris AI Toolkit

Step 1 – Prepare your dataset

Step 2 – Create a dataset in AI Toolkit

Step 3 – Configure a new LoRA job

Step 4 – Monitor samples and adjust

Step 5 – Export and use your LoRA

8. Troubleshooting AI Toolkit LoRA training: common errors and fixes

Dataset not found or empty

Base model download / Hugging Face errors

CUDA out‑of‑memory during training or sampling

LoRA overfits and hijacks the base model