LTX 2.3 LoRA Training on RunComfy AI Toolkit: The Parameter Changes That Actually Improve Results
LTX 2.3 LoRA training does not require understanding every field in the UI. A small number of settings do most of the work:
- T2V is the safer baseline over I2V
- rank, resolution, steps, and dataset mode matter far more than advanced toggles
- most weak LTX 2.3 LoRAs are not caused by "too few options" — they are caused by changing the wrong options first
The key is knowing which settings to change and in what order.
▶ Start training in the browser — RunComfy AI Toolkit
1. The safest baseline
The most reproducible LTX 2.3 starting setup is:
Model
- Model Architecture:
LTX-2.3 - Name or Path:
dg845/LTX-2.3-Diffusers
Memory / quantization
- Low VRAM: ON
- Transformer quantization:
float8 (default) - Text Encoder quantization:
float8 (default) - Layer Offloading: OFF to start
LoRA target
- Target Type:
LoRA - Linear Rank:
32
Training
- Optimizer:
AdamW8Bit - Learning Rate:
0.0001 - Weight Decay:
0.0001 - Batch Size:
1 - Gradient Accumulation:
1 - Steps:
3000 - Timestep Type:
Weighted - Timestep Bias:
Balanced - Loss Type:
Mean Squared Error
Dataset
- Caption Dropout Rate:
0.05 - Resolutions:
512 + 768 + 1024 - LoRA Weight:
1 - Num Repeats:
1
Sampling
- Sample Every:
250 - Sampler:
FlowMatch - Guidance Scale:
4 - Sample Steps:
25 - Sample Size:
768 x 768 - Seed: fixed
- Walk Seed: ON
This baseline is not "magic." It is the cleanest parameter combination that consistently produces stable results in LTX 2.3 LoRA training. For most users, this is where training should start.
2. The parameters that actually matter
Most settings can stay at their defaults for the first run. The ones that actually change outcomes are:
2.1 Resolution is a real quality lever
- 256 is mostly for smoke tests
- 512 is the real floor for character / identity work
- 768–1024 helps when clothing, props, full-body consistency, or small details matter
Multi-resolution training is useful, but it should be used intentionally.
What to do
For fast first passes
- use 512 + 768
For character / clothing / product detail
- keep 1024 enabled too
What not to do
- do not expect strong identity from 256
- do not enable 1024 just because it sounds better if the dataset itself does not contain enough detail to justify it
A simple rule:
- close-up identity / style →
512 + 768 - full body / wardrobe / props / product detail →
512 + 768 + 1024
2.2 Rank is the most important quality knob after the dataset
The only rank control most users need to care about is Linear Rank.
For LTX 2.3, 32 is the correct default. It usually gives enough capacity without making the LoRA too rigid too early.
What to do
Keep rank 32 when:
- training a first LoRA
- training stylized looks
- training a callable style
- training a normal character LoRA that does not need extreme full-body fidelity
Try rank 64 only as a second pass when:
- the subject is highly realistic
- likeness is weak in full-body shots
- clothing / silhouette / prop consistency matters a lot
- rank 32 clearly learns, but stays too soft
What not to do
Do not jump straight to 64 just because "bigger rank must be better."
Higher rank can absolutely help, but it also increases the chance of:
- rigidity
- bleed
- composition contamination
- "everything starts looking like the dataset"
So the practical order is:
- train rank 32
- inspect checkpoints
- move to 64 only if 32 is clearly underfitting
2.3 Steps matter, but not as much as people think
The most useful mental model for LTX 2.3 step counts:
- 2000–3000 steps is the productive first-pass range
- 4000–6000 steps is for harder concepts, not for every run
A lot of users waste time by jumping directly to high step counts before checking whether checkpoint 250, 500, or 750 already looks good.
What to do
Use 3000 as the default
Stay in the 2000–3000 zone when:
- the dataset is reasonably clean
- the concept is already familiar to the base model
- the LoRA is style-focused or identity-focused
Consider 4000–6000 when:
- the concept is unusual
- the base model clearly does not "know" the subject well
- checkpoint previews are still weak after 2000–3000 steps
What not to do
Do not treat more steps as a universal quality upgrade.
If a LoRA already looks strong at checkpoint 750 or 1000, pushing it much further can just make it more brittle.
2.4 Learning rate should stay boring
For LTX 2.3 LoRA training, 1e-4 is the right place to begin.
This is one of those cases where the boring answer is the right answer.
What to do
Start with
- Learning Rate =
0.0001
Lower to
0.00005if:- the LoRA burns in too aggressively
- previews become rigid too early
- identity starts overpowering every prompt
- DOP is still not enough to control bleed
What not to do
Do not start by changing the learning rate unless the previews already show a clear problem.
For most runs, it is better to fix:
- rank
- resolution
- dataset composition
- DOP usage
before touching LR.
3. Dataset strategy matters more than advanced toggles
3.1 For identity and style, image datasets are still valid
Image-only datasets are valid for LTX 2.3 LoRA training.
If the goal is:
- identity
- face
- style
- clothing
- product appearance
then the cleanest setup is often:
- image dataset
- dataset Num Frames = 1
This is a much easier way to get a stable first LoRA than forcing motion learning too early.
Best use case
Use frames = 1 when training:
- characters
- visual identity
- fashion
- stylized look
- static product or brand subjects
Not the right use case
Do not expect frames = 1 to learn:
- camera movement
- choreography
- motion behavior
- action timing
For those, short coherent clips are still the better dataset.
3.2 T2V should be the default first run
The RunComfy AI Toolkit supports both T2V and I2V, but the more stable path is:
- train a good T2V LoRA first
- move to I2V once the baseline works
The reason is simple: T2V guidance is much more consistent, while I2V training on LTX is still less settled.
T2V setup
- Do I2V: OFF
- captions or default caption drive the conditioning
- sample with prompt-only validation
This is the recommended default for most first runs.
3.3 I2V should be treated as a second-stage workflow
I2V is supported, but it should not be the first thing to debug.
I2V setup
- Do I2V: ON
- use a dataset that was actually prepared for image-conditioned behavior
- validate using Add Control Image in the Sample section
Practical advice
If I2V results look weak, the first things to question are:
- dataset pairing quality
- conditioning frame quality
- whether the job really should be I2V in the first place
Not:
- whether ten advanced toggles need to be changed
4. Captioning: one good setting matters, one trap matters
4.1 Caption Dropout 0.05 is a good default
- Caption Dropout Rate =
0.05
This is a sensible setting. It helps the LoRA avoid becoming too dependent on captions being present in exactly one form.
4.2 The important trap: don't combine dropout with cached text embeddings
If caption dropout is being used, Cache Text Embeddings should stay OFF.
This is one of the few "small" settings that can quietly make training behavior worse if used incorrectly.
Recommended rule
- Caption Dropout > 0 → keep Cache Text Embeddings OFF
- only consider caching text embeddings when captions are static and dropout is zero
5. The one advanced toggle that is actually worth using early: DOP
The RunComfy AI Toolkit exposes several advanced / regularization toggles. Most of them should be ignored on the first run.
The one exception is:
- Differential Output Preservation (DOP)
DOP is the first advanced option worth reaching for when a LoRA starts to "bleed" into everything.
Turn DOP ON when:
- training a narrow character LoRA
- training a product LoRA
- the trigger should have clear on/off behavior
- the LoRA keeps leaking its style or subject into unrelated prompts
Leave DOP OFF when:
- running a simple first baseline
- training a loose style LoRA where some base shift is acceptable
Leave this OFF to start:
- Blank Prompt Preservation
For most LTX 2.3 LoRA runs, Blank Prompt Preservation is not the first fix. DOP is.
6. Validation: make sampling cheaper and more useful
A lot of training time is wasted on validation that does not actually help compare checkpoints.
Good sampling defaults for LTX 2.3:
- FlowMatch
- Guidance 4
- 25 steps
- 768 x 768
- sample every 250
That is enough for a strong comparison loop.
The best sampling trick for character LoRAs
If the job is mostly checking:
- likeness
- face
- clothing
- style
then early validation does not need full video samples every time.
A very practical approach is:
Fast likeness check
- Sample Num Frames = 1
- FPS = 1
- keep prompt / seed / size fixed
Then, once the LoRA is clearly learning:
Final motion check
- switch back to
- Num Frames = 121
- FPS = 24
This makes early checkpoints much easier to compare and much cheaper to generate.
7. What not to touch on the first run
The LTX 2.3 trainer exposes many toggles. Most of them are not the first place to hunt for quality.
For a first stable run, leave these alone:
- Layer Offloading
- Use EMA
- Unload TE
- Cache Text Embeddings
- Blank Prompt Preservation
- Do Differential Guidance
These can all matter in special cases. But they are not where the biggest gains usually come from.
The biggest gains usually come from:
- using the right dataset mode
- avoiding 256 for serious identity work
- starting at rank 32
- using 1e-4 and 2000–3000 steps
- turning on DOP only when bleed becomes a real issue
- validating with fixed prompts and fixed seeds
8. Quick fixes for the most common failure patterns
Problem: the LoRA learns too weakly
Try this order:
- remove 256
- keep 512 + 768
- add 1024 if detail matters
- keep rank 32
- extend toward 4000 steps
- only then test rank 64
Problem: the LoRA is too rigid or contaminates every prompt
Try this order:
- turn DOP ON
- reduce rank from 64 back to 32
- reduce LR from 1e-4 to 5e-5
- stop the run earlier instead of pushing more steps
Problem: previews take too long
Do this:
- switch early validation to 1 frame
- keep full 121-frame previews for later checkpoints only
9. Bottom line
Stable LTX 2.3 LoRA results come from a very small set of decisions:
The safest baseline
- LTX-2.3
- dg845/LTX-2.3-Diffusers
- Low VRAM ON
- float8 / float8
- rank 32
- AdamW8Bit
- LR 1e-4
- 3000 steps
- 512 + 768 + 1024
- caption dropout 0.05
- sample every 250
- FlowMatch / 25 steps / guidance 4
The practical upgrade path
- start with T2V
- use frames = 1 for image-based identity/style LoRAs
- add 1024 only when detail really matters
- move from 32 → 64 rank only if 32 is clearly underfitting
- use DOP when bleed appears
- treat I2V as the second pass, not the first debug target
Ready to start training?

