FLUX.2 Klein 4B & 9B LoRA Training with Ostris AI Toolkit
FLUX.2 Klein is a unified text‑to‑image + image editing model family that comes in two open‑weights “Base” sizes: 4B and 9B. This guide shows you how to train practical FLUX.2 Klein LoRAs using Ostris AI Toolkit, with an emphasis on what’s specific to Klein (Base vs Distilled expectations, 4B vs 9B compatibility, VRAM realities, and the common Klein‑only failure modes).
By the end of this guide, you’ll be able to:
- Choose FLUX.2 Klein 4B Base vs 9B Base correctly (and avoid “wrong model size” LoRA issues).
- Plan VRAM and pick the right quantization + sampling defaults for Base Klein.
- Build a dataset and trigger strategy for character, style, or product/concept LoRAs.
- Run a smoke test with correct Base‑model sampling, then scale up without guessing.
- Fix the common Klein‑specific problems (license gating, Base-vs‑Distilled testing mismatch, 9B training collapse patterns, and current AI Toolkit edge cases).
This article is part of the AI Toolkit LoRA training series. If you’re new to Ostris AI Toolkit, start with the AI Toolkit LoRA training overview before diving into this guide:
https://www.runcomfy.com/trainer/ai-toolkit/getting-started
Table of contents
- 1. FLUX.2 Klein overview: what makes 4B/9B different (and why Base sampling matters)
- 2. Environment options: local AI Toolkit vs cloud AI Toolkit on RunComfy
- 3. Hardware & VRAM planning for FLUX.2 Klein 4B vs 9B LoRA training
- 4. Building a FLUX.2 Klein LoRA training dataset (character vs style vs product)
- 5. Step-by-step: train a FLUX.2 Klein LoRA in AI Toolkit
- 6. Recommended FLUX.2 Klein LoRA configs by VRAM tier
- 7. Common FLUX.2 Klein training issues and how to fix them
- 8. Using your FLUX.2 Klein LoRA after training
1. FLUX.2 Klein overview: what makes 4B/9B different (and why Base sampling matters)
1.1 Klein is “one model for generation + editing”
Klein is designed as a single model family for both text-to-image generation and image editing. Practically, that means a style/character/product LoRA you train for Klein can often be useful across both “generate” and “edit” workflows—your data and captions decide what it learns.
1.2 4B vs 9B: pick based on your goal and your hardware
- 4B Base is the best starting point for most users: faster iteration, easier VRAM fit, and generally simpler to keep stable.
- 9B Base can deliver better prompt fidelity and detail when you can afford the VRAM and stability tuning, but it’s less forgiving (and has more “edge case” reports in the wild).
Important compatibility rule:
A 4B LoRA does not work on 9B, and a 9B LoRA does not work on 4B. Always load the LoRA on the same Klein size you trained it on.
1.3 Base vs Distilled (and what AI Toolkit currently supports)
Klein is commonly discussed in two “behavior” categories:
- Base = undistilled checkpoints intended for fine-tuning / LoRA training.
- Distilled = accelerated inference behavior (very low step counts).
In AI Toolkit right now, you only select: _FLUX.2 Klein 4B Base_ or _FLUX.2 Klein 9B Base_.
There is no Distilled option in the Model Architecture dropdown, so this tutorial is intentionally Base‑only.
1.4 The #1 Klein gotcha: Base needs more inference steps
A huge number of “my LoRA is bad” reports come from sampling Base like it’s Distilled.
If you preview Base Klein at ~4–8 steps, it will look undercooked or noisy.
For Base Klein, use these as your evaluation defaults:
- Sample Steps / Inference Steps: ~50
- Guidance Scale (CFG): ~4
This single change fixes a lot of false alarms during training.
2. Environment options: local AI Toolkit vs cloud AI Toolkit on RunComfy
You can run AI Toolkit in two ways for this tutorial:
- Local AI Toolkit (your own GPU)
Install AI Toolkit from the GitHub repository, run the Web UI, and train on your own machine. This is a good fit if you already have a compatible NVIDIA GPU and you’re comfortable managing CUDA/drivers/disk.
- Cloud AI Toolkit on RunComfy (H100 / H200)
Open AI Toolkit in the browser and train on cloud GPUs (H100 80GB / H200 141GB). This is the easiest path for 9B Base runs, large datasets, or high-resolution training without VRAM compromises.
https://www.runcomfy.com/trainer/ai-toolkit/app
The workflow and UI are the same—the only difference is where the GPU lives.
3. Hardware & VRAM planning for FLUX.2 Klein 4B vs 9B LoRA training
3.1 Reality check: “fits for inference” ≠ “fits for training”
Even if a checkpoint “fits” for inference in BF16, training adds overhead (optimizer states, activations, LoRA modules, sampling previews). Plan with headroom.
3.2 Practical tiers (what to expect)
A useful way to plan:
- 4B Base
- Practical for local training on 24GB with conservative settings (batch 1, sensible ranks, quantization as needed).
- You can sometimes train smaller resolutions on less VRAM, but iteration becomes more fragile.
- 9B Base
- Treat 32GB+ as the practical local floor for comfortable LoRA work.
- For easy, high-res iteration: cloud GPUs (H100/H200) are the “no-drama” option.
3.3 A Klein-specific warning about 9B + aggressive memory tricks
Community reports show that some 9B training setups can be more brittle—especially when relying on heavy memory-saving strategies. If you need “tight VRAM” training, it’s often more productive to:
1) train 4B Base first, or
2) move the run to cloud GPUs,
instead of fighting unstable 9B runs locally.
4. Building a FLUX.2 Klein LoRA training dataset (character vs style vs product)
Keep the workflow simple: curate clean data first, then tune knobs.
4.1 Universal dataset rules (high impact)
- Remove near-duplicates unless you intentionally want one shot to dominate.
- Avoid watermarks, UI overlays, and text blocks unless the LoRA is about those artifacts.
- Keep a consistent “signal”: your LoRA should learn identity or style or product, not random background coincidences.
4.2 Character / likeness LoRAs
Target: consistent identity across many prompts.
- Typical dataset: 20–60 curated images
- Variety matters: multiple angles, lighting, expressions, focal lengths
- Captions: keep them short; don’t over-describe face parts
Trigger: recommended
Use a unique token/name so you can turn it on/off.
4.3 Style LoRAs
Target: a reusable look that doesn’t destroy prompt fidelity.
- Typical dataset: 50–200 images (more variety helps)
- Mix subjects: people + objects + scenes so style becomes the only constant
- Captions: emphasize style attributes (medium, palette, lighting language)
Trigger: optional
If you want a “callable style,” add a trigger.
4.4 Product / concept LoRAs
Target: stable geometry/materials for a specific product or new concept.
- Typical dataset: 30–100 images
- Keep framing and scale reasonably consistent early on
- Use captions to name the product and key attributes you want preserved
Trigger: strongly recommended
Products/concepts benefit a lot from explicit activation control.
5. Step-by-step: train a FLUX.2 Klein LoRA in AI Toolkit
This is the fast path. It’s intentionally focused on the panels users actually click.
Step 0 — Choose where you’ll run AI Toolkit
- Local AI Toolkit (your own GPU) — good for 4B Base and smaller runs.
- Cloud AI Toolkit on RunComfy — best for 9B Base and high-res training without VRAM tuning.
https://www.runcomfy.com/trainer/ai-toolkit/app
Step 1 — Create a dataset in AI Toolkit
In the AI Toolkit UI, open the Datasets tab.
Create a dataset (example name):
klein_my_lora_v1
Upload your images and (optionally) matching .txt caption files.
If you’re not ready to caption per-image, you can start with:
- a Trigger Word (JOB panel), and
- a short Default Caption (DATASETS panel).
Step 2 — Create a new Job (configure panels in UI order)
Job panel
- Training Name: something descriptive (e.g.,
klein4b_character_lora_v1) - GPU ID: pick your GPU locally; in cloud leave default
- Trigger Word:
- Character/product: recommended (unique token)
- Style: optional (recommended if you want clean on/off control)
Model panel
- Model Architecture: choose FLUX.2 Klein 4B Base or FLUX.2 Klein 9B Base
- Name or Path:
- Use the official model repo for the size you picked
- If you select 9B and downloads fail, see Troubleshooting (license gating)
Quantization panel
Quantization is mainly about making the run fit and keeping it stable.
- If you’re training on tighter VRAM (especially 9B), enable quantization for the heavy components.
- If you hit quantization-related errors, temporarily disable quantization to validate the pipeline, then re-enable once training runs.
Target panel
This is where you decide LoRA capacity.
- Target Type: LoRA
- Linear Rank (starter defaults):
- 4B Base: start 16, move to 32 if underfitting
- 9B Base: start 16–32 (prefer 16 if you’ve seen instability)
If your run “collapses” or becomes unstable, reducing rank is one of the fastest stabilizers.
Save panel
- Data Type: BF16 is a safe default for modern diffusion LoRAs
- Save Every: 250–500 steps is a practical cadence
- Max Step Saves to Keep: 3–6 (keeps disk use reasonable)
Training panel
Keep these simple and conservative first:
- Batch Size: 1 (increase only if you have headroom)
- Gradient Accumulation: 1–4 (use this to raise effective batch size without VRAM spikes)
- Learning Rate:
- Start 1e‑4 if your runs are stable
- If you see instability or “collapse,” try 5e‑5
- Steps (practical starter ranges):
- Small datasets (20–40 imgs): 2000–4000
- Medium datasets (50–120 imgs): 3000–6000
If you’re uncertain, do a smoke test first:
- Run ~1000 steps, check samples, then decide whether to continue or restart with adjusted rank/LR.
Regularization (highly recommended for Klein 9B if you see collapse)
If you have a narrow dataset (single character or single product), add a small regularization dataset (generic images of the same broad class) at lower weight. This can reduce collapse/overfit patterns and improve generalization.
Datasets panel
- Target Dataset: select your dataset
- Default Caption (optional):
- Character:
photo of [trigger] - Style:
[trigger], watercolor illustration, soft edges, pastel palette - Product:
product photo of [trigger], clean background, studio lighting - Caption Dropout Rate: small values (like 0.05) can help avoid “caption overfitting” if you are not caching text embeddings
- Cache Latents: enable if available (big speedup)
- Resolutions:
- Start with one primary resolution (e.g., 1024) for your first run
- Add more buckets later if you need robustness across sizes
Sample panel (this is Klein‑critical)
Because you’re training Base Klein, set sampling like Base—not Distilled.
Use these starter values:
- Sample Every: 250–500
- Guidance Scale: ~4
- Sample Steps: ~50
- Seed: fixed (e.g., 42) so progress is comparable
Add 6–10 prompts that reflect your real use-cases (character, style, product).
Step 3 — Launch training & monitor
Go to Training Queue, start the job, then watch:
- Samples: judge progress only using Base‑appropriate sample steps (≈50)
- Stability: if outputs get worse after improving, stop and roll back to an earlier checkpoint
6. Recommended FLUX.2 Klein LoRA configs by VRAM tier
These are “good defaults,” not hard rules.
Tier A — 4B Base on 24GB (common local setup)
- Quantization: ON if needed to fit
- Batch size: 1
- Rank: 16 (go 32 if underfitting)
- Resolution: 768–1024
- Sampling: steps 50, CFG ~4
Tier B — 9B Base on 32–48GB (local “serious” setup)
- Quantization: strongly recommended
- Batch size: 1 (raise only with headroom)
- Rank: 16 first (32 only if stable)
- Add a reg dataset if training becomes unstable or collapses
- Sampling: steps 50, CFG ~4
Tier C — Cloud H100/H200 (fast iteration, simplest configs)
- Prefer 9B Base if you want maximum fidelity
- Batch size: 2–4 is often practical
- Rank: 32 is reasonable if the run is stable
- Use 1024 as default; expand buckets only if needed
- Sampling: steps 50, CFG ~4
7. Common FLUX.2 Klein training issues and how to fix them
This section is Klein-specific (not generic AI Toolkit advice).
“My LoRA looks weak / noisy” (but loss is decreasing)
Most likely cause: you are sampling Base Klein with Distilled-style steps.
Fix
- In the Sample panel, set Sample Steps ≈ 50 and Guidance Scale ≈ 4
- Re-evaluate checkpoints only after changing sampling
9B Base won’t download / access denied
Most likely cause: the 9B model is gated behind a license click-through, and your environment isn’t authenticated.
Fix
- Accept the license / request access on the model page: FLUX.2-Klein-9B
- Add a Hugging Face Read token in AI Toolkit Settings
- Re-run the job after saving the token
(If you want a step-by-step checklist, RunComfy has a dedicated “Hugging Face token for FLUX” help page.)
“I trained a LoRA, but it does nothing”
Most likely causes (Klein-specific)
- You trained on 4B but are testing on 9B (or vice versa)
- You trained on Base but are testing on a different Klein variant elsewhere
Fix
- Confirm the model size matches (4B LoRA → 4B Base; 9B LoRA → 9B Base)
- Keep your evaluation pipeline consistent with your training base
9B training “collapses” (quality suddenly degrades or becomes chaotic)
This is a commonly reported 9B pattern in community discussions.
Fix order (most effective first)
1) Lower Learning Rate (try 1e‑4 → 5e‑5)
2) Reduce Rank (try 32 → 16)
3) Add a regularization dataset (generic same-class images at lower weight)
4) Shorten the run and early stop (pick the last “good” checkpoint)
If you want fast progress without fighting collapse, train 4B Base first.
AI Toolkit edge cases reported for Klein (current known pain points)
Some users have reported:
- Layer Offloading not behaving as expected on Klein 9B in certain setups
- Edit-mode / control-image training errors in some configurations
- GPU not being utilized in specific environments (notably some WSL2 reports)
Practical workaround
- If you hit one of these and you need a reliable run today:
- switch to 4B Base, or
- move the run to cloud AI Toolkit, or
- update AI Toolkit to the latest version and retry
8. Using your FLUX.2 Klein LoRA after training
8.1 Use Base-style generation settings when you test
When you test your LoRA on Base Klein, start with:
- Steps: ~50
- CFG: ~4
- LoRA weight: 0.6 → 1.0 (sweep a few values)
8.2 Test like a pro (fast, repeatable)
1) Generate without LoRA (baseline)
2) Generate with LoRA at 0.6 / 0.8 / 1.0
3) Keep seed + steps + CFG constant
4) Judge:
- activation strength (does it show up?)
- control (does it stay off when not triggered?)
- generalization (does it work on new prompts?)
8.3 Editing workflows
Klein supports editing workflows too, so once your LoRA behaves in generation, you can apply it in an edit pipeline to keep identity/style/product consistency during edits.
More AI Toolkit LoRA training guides
Ready to start training?

