WAN 2.2 Character Consistency LoRA Training: How to Fix Face Drift in I2V
If you need WAN 2.2 character consistency LoRA training, you are usually dealing with the same failure:
The first frame looks right, but once the person smiles, turns their head, or changes pose, the face stops looking like the same person.
This page is for users who want to do WAN 2.2 character consistency LoRA training so the same character stays recognizable across motion, not just in one good starting frame.
By the end, you will know:
- why WAN 2.2 I2V loses identity so easily
- when a reference image is enough and when you also need a character LoRA
- whether to train a T2V character LoRA, an I2V character LoRA, or use both workflows together
- how to design a dataset that reduces identity drift and face morphing
- how to run the training workflow in Ostris AI Toolkit
This article is part of the AI Toolkit LoRA training series. Start with the main Wan 2.2 I2V 14B LoRA training guide if you want the full panel-by-panel overview first.
Table of contents
- 1. Why WAN 2.2 I2V character consistency breaks during motion
- 2. Reference image vs character LoRA: which one fixes face drift?
- 3. T2V vs I2V: which path for WAN 2.2 character consistency LoRA training?
- 4. Best dataset design for WAN 2.2 I2V character consistency
- 5. Best AI Toolkit recipe for WAN 2.2 character consistency LoRA training
- 6. Why WAN 2.2 I2V identity drift and face morphing happen
- 7. When RunComfy Cloud AI Toolkit is the better move
- 8. Bottom line
1. Why WAN 2.2 I2V character consistency breaks during motion
The hard part of WAN 2.2 I2V character consistency is that the model is solving two jobs at once:
- preserve the identity from the source image
- create believable motion, expression, and viewpoint changes over time
Those goals fight each other.
The more motion, camera change, or expression change you ask for, the more chances the model has to "reinterpret" the face instead of preserving it.
That is why the same pattern shows up again and again:
- neutral first frame looks close
- smiling introduces a different face
- head turns weaken identity
- pose changes make the subject feel like a cousin instead of the same person
In other words, the issue is not just "better prompting." It is an identity control problem.
2. Reference image vs character LoRA: which one fixes face drift?
2.1 Reference image only
A single reference image is useful for:
- starting pose and framing
- broad facial familiarity
- initial clothing and scene anchoring
In practice, a reference-only workflow usually gives you familiarity, not true consistency.
2.2 Character LoRA only
A character LoRA is useful for:
- carrying identity across new prompts
- keeping the same person recognizable across multiple scenes
- getting something you can keep using across scenes instead of relying on one source frame
WAN 2.2 character consistency LoRA training is the next step when reference images are not enough, but character LoRA alone is still not a perfect substitute for a good source image in I2V — it does not automatically lock the exact pose, camera, or frame-to-frame structure.
2.3 Reference image + character LoRA
This is the strongest practical setup for many users.
Use the reference image to anchor the shot.
Use the character LoRA to anchor the identity.
That is why many experienced WAN users end up combining both instead of debating them as alternatives.
3. T2V vs I2V: which path for WAN 2.2 character consistency LoRA training?
This is one of the most common questions around WAN 2.2 I2V character consistency.
3.1 Why people use T2V character LoRAs in I2V
A T2V-trained character LoRA often works well inside I2V workflows.
That makes sense when your main goal is:
- "keep this same character across prompts"
- "make the face recognizable in many scenes"
- "reuse the same character LoRA in both T2V and I2V"
In that case, the LoRA is mostly teaching who the character is, not how a specific source frame should move.
3.2 When an I2V-specific character LoRA is worth it
Train an I2V-focused character LoRA when your actual problem is more specific:
- the face breaks during smiles
- profile turns are the failure point
- motion introduces morphing
- you care about the same person under changing camera angle and expression
That is where motion-aware identity data becomes more valuable.
3.3 Practical recommendation
If you need a simple rule:
- start with a character LoRA that captures the identity cleanly
- use it inside WAN 2.2 I2V with a strong reference image
- move to a more I2V-specific dataset only if motion-induced drift is still the main failure
4. Best training dataset for WAN 2.2 I2V character consistency LoRA
If your goal is same character in WAN 2.2 I2V, your dataset should teach identity under change, not identity under one frozen pose.
4.1 Prioritize face-rich clips
Use clips where:
- the face is large enough to matter
- expressions change
- the head turns
- the character moves naturally without extreme blur
If the face only occupies a tiny part of the frame, the model has fewer identity pixels to learn from.
4.2 Build around the failure cases you actually care about
Do not collect random clips.
Collect clips that match the search intent behind this page:
- smiling without face drift
- turning the head without identity loss
- pose changes without morphing
- consistent identity across close-up and medium shots
4.3 Keep the identity signal clean
Use the same person or character throughout the dataset.
Avoid:
- heavily merged checkpoints as your base
- low-quality compressed video
- clips where motion blur destroys the face
- too many clips where the face is tiny
4.4 Use a turnaround mindset
If you can, include:
- front view
- three-quarter view
- side view
- different expressions
- different lighting
The model should learn:
this is still the same person when the view changes
4.5 A helpful upstream trick
A useful upstream trick is to first generate a consistent multi-angle character sheet with an image-edit model such as Qwen Image Edit, then build the WAN character dataset from that cleaner identity source.
That can be a smart move when your raw source material is inconsistent but the identity matters a lot.
5. Best AI Toolkit recipe for WAN 2.2 character consistency LoRA training
Use the main Wan 2.2 I2V training guide for the complete walkthrough. For identity-focused work, these are the practical defaults to think about first.
Dataset and clip length
- start with short, face-readable clips
- use a conservative Num Frames first, such as
21or41 - keep resolution conservative until the run is stable
Model behavior
Identity in WAN depends heavily on the lower-noise refinement stages, but you still need motion and composition from the higher-noise stage.
So for most character LoRAs:
- train both stages
- do not turn the motion side off completely
- keep the training balanced before you start biasing toward detail
Trigger strategy
If you want a character LoRA you can keep using:
- use a unique trigger word
- keep captions simple and consistent
- describe what should remain variable, not every facial part
Sampling strategy
Use the same:
- reference image
- prompt template
- seed
- preview cadence
across checkpoints, so you can actually judge whether consistency is improving.
If you keep changing everything during evaluation, you will not know whether the LoRA improved or your test changed.
6. Why WAN 2.2 I2V identity drift and face morphing happen
6.1 The face is too small
This is one of the most common real causes.
If the face has too few pixels, the model cannot preserve what it cannot see clearly.
6.2 You are using merged or messy bases
Merged models often have worse face consistency than cleaner base setups.
If identity is the job, use the most stable base available first.
6.3 The dataset teaches only one expression
If all your data is neutral and front-facing, smiling or profile shots become out-of-distribution at inference time.
6.4 The prompt asks for change without explicitly protecting identity
Prompt hacks are not a substitute for a good LoRA, but they still help.
If you want strong preservation, say so clearly:
- preserve the same face
- keep facial features consistent
- maintain the character identity from the reference image
6.5 You expected the reference image alone to solve everything
Reference images help, but they do not magically solve expression drift over time.
That is exactly why people search for WAN 2.2 I2V character LoRA workflows in the first place.
7. When RunComfy Cloud AI Toolkit is the better move
If you are doing WAN 2.2 character consistency LoRA training seriously, this is a good example of a task where cloud training can save time.
Use RunComfy Cloud AI Toolkit when:
- your local GPU struggles with video training
- you want to test multiple datasets quickly
- you want to keep preview clips and checkpoints organized in one persistent workspace
- your goal is a character LoRA that stays useful across scenes, not a hardware experiment
For WAN I2V work, the biggest cost is often not the training itself. It is the number of bad retries caused by unstable local constraints.
If your identity control problem is business-critical, moving the training run to a more stable environment is often the faster path to a usable result.
Open it here: RunComfy Cloud AI Toolkit
8. Bottom line
To get the most out of WAN 2.2 character consistency LoRA training, the most practical approach is usually:
- use a strong reference image
- add a character LoRA that keeps working across prompts
- train on clips that reflect the real failure cases: smiles, turns, and motion
- keep the evaluation consistent across checkpoints
If your users care about same character, better control, and fewer broken outputs, that is the right mental model for this page.
You are not training a general model.
You are training a LoRA that keeps the same character recognizable during motion on top of WAN 2.2 I2V.
Bereit zum Starten des Trainings?

