AI Toolkit LoRA Training Guides

FLUX.2 Klein 9B High-Likeness Character LoRA: Settings That Matter

A no-guessing guide to FLUX.2 Klein 9B character LoRA training. Covers the repeats-per-image formula, best-practice defaults for batch size, gradient accumulation, learning rate, rank, resolution, captions, and correct Base sampling for previews.

Train Diffusion Models with Ostris AI Toolkit

FLUX.2 Klein 9B High‑Likeness Character LoRA (AI Toolkit): What Settings Actually Matter

If you’re training a character / identity LoRA on FLUX.2 Klein 9B Base and you keep asking:

  • “What does Num Repeats actually do?”
  • “How do I calculate Training Steps?”
  • “If I change Gradient Accumulation, do I also need to change Steps?”
  • “What other settings matter most for high likeness?”

This tutorial is the “no guessing” answer.


0) The #1 reason people get confused: there are TWO “steps”

AI Toolkit shows Training Steps and you’ll also see Sample Steps (preview / inference).

  • Training → Steps = how long the optimizer trains (this is the stop counter).
  • Sample Steps (preview / inference) = how many denoising steps are used to render sample images.

Do not mix them.

If someone says “28 steps is the sweet spot,” they might be talking about inference/sample steps, not training length.

For Base Klein, don’t judge your LoRA using low sample steps. Use Base‑appropriate sampling when previewing (more on that below).


1) The only metric you should optimize: “repeats per image” (training dose)

For high‑likeness character LoRAs, you want each training image to be “seen” roughly:

  • 50–90 repeats per image = normal character identity training
  • 90–120 repeats per image = high‑likeness push (stronger identity lock)

The formula (copy/paste)

Let:

  • N = number of training images
  • B = batch size
  • G = gradient accumulation
  • S = training steps

Then:

Repeats per image

repeats_per_image ≈ (S × B × G) / N

Steps you should enter

S ≈ ceil( N × target_repeats / (B × G) )

✅ If you change Gradient Accumulation, your Steps must change to keep the same training dose.


2) “What should I enter?” (best-practice defaults for high likeness)

A) Training panel (highest impact)

Use these as your starting point:

  • Batch Size: 1
  • Gradient Accumulation: 1 (best likeness)
    • If VRAM is tight, use 2–4 and lower Steps proportionally.
  • Learning Rate: start 1e-4
    • If training becomes unstable / “collapses,” try 5e-5
  • Steps: calculate with the formula above (don’t guess)
  • Optimizer / timestep settings: keep defaults at first (change only if you’re debugging)

B) Target panel (LoRA capacity)

  • Linear Rank (9B Base): start 16
    • If the LoRA is clearly underfitting and training is stable, try 32
    • If you see instability/collapse, go back down to 16

C) Dataset panel (text supervision = identity control)

For character LoRAs:

  • Default Caption: photo of [trigger]
  • Caption Dropout Rate: 0.05 (helps avoid “caption overfitting” in some setups)
  • Resolutions: use 1024 as your default for Klein when possible
    • Add 768 only if you want more flexibility across sizes.

D) Sample panel (how to preview correctly)

If your LoRA looks “weak” in samples, it’s often not the training—it's the sampling.

For Base Klein, use preview settings like:

  • Sample Steps: ~50
  • Guidance / CFG: ~4

Then compare checkpoints again.


3) The “55 images” example (real numbers)

Say you have:

  • N = 55 images
  • target repeats = 100 (high-likeness push)
  • batch size B = 1

Option 1 (best likeness): Grad Accum = 1

Steps = 55 × 100 / (1 × 1) = 5500

Enter:

  • Gradient Accumulation: 1
  • Steps: 5500

Option 2 (VRAM-friendly): Grad Accum = 4

Steps = 55 × 100 / (1 × 4) = 1375  (~1400)

Enter:

  • Gradient Accumulation: 4
  • Steps: 1375 (or 1400)

✅ Both options deliver ~100 repeats per image.

The difference is how many mini-batches are merged into each step.


4) Mini “cheat sheet” (high-likeness character LoRA)

If you just want something you can copy:

Klein 9B Base – High Likeness Starter

  • Batch Size: 1
  • Grad Accum: 1 (or 2–4 if needed)
  • Target repeats per image: 90–110
  • Steps: ceil(N × repeats / (B × G))
  • LR: 1e-4 (drop to 5e-5 if unstable)
  • Rank: 16 (try 32 only if stable + underfitting)
  • Resolution: 1024
  • Default caption: photo of [trigger]
  • Caption dropout: 0.05
  • Preview sampling (Base): Sample steps ~50, Guidance ~4

5) Troubleshooting (fast fixes)

“My LoRA looks weak / noisy, but loss is going down”

Most likely you are previewing with the wrong sampling setup.

  • Set Sample Steps ~50 and Guidance ~4, then re-check.

“It was getting good, then suddenly everything got chaotic / worse” (9B “collapse”)

Try fixes in this order:

1) Lower LR (1e-4 → 5e-5)

2) Lower Rank (32 → 16)

3) Add a small regularization dataset at lower weight

4) Early stop and use the last “good” checkpoint

“Do I get better quality if I reduce Gradient Accumulation?”

Often yes for identity/likeness:

  • Lower G can help the LoRA stay more “specific” (less averaged).
  • But you must increase Steps to keep the same training dose.

6) Bottom line

For FLUX.2 Klein 9B character likeness, the biggest levers are:

1) Training dose (Steps × Batch × Grad Accum relative to image count)

2) Learning rate

3) Rank

4) Resolution

5) Caption strategy

6) Correct Base sampling for previews

If you control those deliberately, you stop guessing—and your results get consistent.

Ready to start training?