LTX 2.3 MSR Workflow in ComfyUI | Multi-Subject Identity Video

ComfyUI LTX 2.3 MSR Workflow

Want to run this workflow?

Fully operational workflows
No missing nodes or models
No manual setups required
Features stunning visuals

ComfyUI LTX 2.3 MSR Examples

LTX 2.3 MSR multi‑subject identity video workflow for ComfyUI#

This workflow turns several character or object references into a single, consistent, story‑ready video using LTX 2.3 MSR. It preserves identity across multiple subjects while leveraging the LTX‑2.3 audio‑visual model for motion, cinematography, and synchronized sound. Creators can combine up to four subject images plus a background, then guide the scene with prompts for dialogue, group shots, and dynamic lifestyle sequences.

Built for storytellers, advertisers, and social creators, the graph assembles references into an MSR guide, injects identity via an image‑conditioned LoRA pass, and samples an audio‑visual latent that decodes to frames and optional audio. LTX 2.3 MSR is the anchor for identity fidelity; the rest of the pipeline handles composition, motion guidance, and export.

Key models in Comfyui LTX 2.3 MSR workflow#

LTX‑2.3 22B distilled (1.1) by Lightricks. The base audio‑visual foundation model that generates motion, visuals, and synchronized audio. Weights are published on Hugging Face under LTX‑2.3. Lightricks/LTX-2.3
Gemma 3 12B Instruct text encoder (fp4 mixed). Used for prompt encoding in the LTX stack to translate text into conditioning signals for generation. Packaged with the LTX assets for ComfyUI. Comfy-Org/ltx-2
LTX 2.3 MSR LoRA (Licon MSR V1). A Multi‑Subject Reference LoRA specialized for LTX‑2.3 that locks multiple identities at once, stabilizing faces, clothing, and object features across the whole clip. liconstudio/ComfyUI-Licon-MSR
LTX‑2 Audio VAE. Provides the latent audio space and decoding used when generating or attaching synchronized sound with LTX‑2.x assets. Comfy-Org/ltx-2

How to use Comfyui LTX 2.3 MSR workflow#

This graph has three phases: build an MSR guide from references, condition the video latent with multi‑image guidance and prompts, then sample and decode to frames and audio.

Comfig
- Set your canvas width, height, total frames, and fps in the configuration nodes. These feed the empty video and audio latents and the export stage, keeping timing consistent from conditioning through final render.
- Choose aspect and duration that fit your story. Higher frame counts increase motion continuity but also VRAM and runtime.
Reference loaders
- Load up to four subject images (img1, img2, img3, img4) and a background (bg). These map to refimg1..4 and refbg getters so you can quickly swap sources without rewiring.
- Use clear, well‑lit images with the subject centered and unobstructed. For clothing or props you want preserved, ensure they are visible in at least one reference.
MSR composer
- LiconMSR (#28) assembles the subject references and background into a single MSR image output. This becomes the visual identity blueprint for LTX 2.3 MSR, aligning facial features, attire, and object details before sampling.
- A small VHS_VideoCombine (#66) creates a quick low‑FPS preview from the MSR output so you can sanity‑check composition before running the full render.
Multi‑guide conditioning
- LTXVAddGuideMulti (#108) ingests up to five images (your four subjects plus background) along with the positive and negative prompts to produce an initial video latent with spatial and appearance guidance.
- Positive prompt text describes scene, camera, and vibe; negative text avoids artifacts and off‑style looks. LTXVConditioning (#7) attaches your fps so motion timing matches the exporter.
LoRA identity control
- The LTX 2.3 MSR LoRA is loaded into the model, and LTXAddVideoICLoRAGuide (#9) applies an image‑conditioned LoRA pass using the MSR image. This reinforces identity across frames without freezing motion.
- Use this stage to balance identity strength with freedom for natural movement and expressions.
Sampling
- The sampler stack uses CFGGuider (#37), KSamplerSelect (#13), ManualSigmas (#27), and RandomNoise (#15) feeding SamplerCustomAdvanced (#16). The result is a joint audio‑visual latent that reflects your references, prompts, and MSR constraints.
- If you need new variations, change the noise seed or sampler while keeping references and MSR settings fixed for consistency.
Crop guidance and decode
- LTXVCropGuides (#17) adjusts the video latent to your target frame size, avoiding unwanted trims. The video and audio latents are then split by LTXVSeparateAVLatent (#24).
- VAEDecode (#38) converts video latents to frames; LTXVAudioVAEDecode (#25) reconstructs audio.
Export
- VHS_VideoCombine (#96) assembles frames and optional audio into H.264 MP4 at your chosen fps, using your filename_prefix. This is the final video produced by the LTX 2.3 MSR workflow.

Key nodes in Comfyui LTX 2.3 MSR workflow#

`LiconMSR` (#28)#

Assembles 1–4 subject references plus a background into a single MSR guide. Set width and height to match your target canvas so the composed guide and final frames align. If you see identity drift, revisit the input references or increase how prominently the key subjects appear in their source images.

`LTXVAddGuideMulti` (#108)#

Combines multiple guidance images with your prompts to form the initial video latent. Use it to prioritize which references dominate the scene by slightly favoring hero subjects. Keep background guidance active for stable environments and fewer scene jumps.

`LTXAddVideoICLoRAGuide` (#9)#

Injects the image‑conditioned MSR LoRA using the composed MSR image. Increase strength to tighten identity preservation for faces, attire, or props; reduce it if motion feels too constrained. Crop choices should reflect where subjects appear most often in the frame.

`CFGGuider` (#37)#

Controls how strongly the sampler follows your prompts. Higher cfg improves adherence to textual intent but can reduce variety; moderate values keep a natural look while honoring the MSR guidance.

`SamplerCustomAdvanced` (#16)#

Runs the denoising process using your chosen sampler, sigmas, and noise seed. Euler or DPM‑style samplers work well with LTX‑2.3; explore seeds for alternates while keeping the same references to retain identity.

`VHS_VideoCombine` (#96)#

Builds the final MP4 with optional audio. Match frame_rate to the conditioning stage and set a clear filename_prefix for versioning. Use this node’s preview to review pacing and identity consistency before sharing.

Optional extras#

Prepare references with neutral, front‑facing angles and minimal occlusion; add a second angle for complex hairstyles or accessories.
Keep wardrobe and prop references large enough that textures and logos are visible; avoid heavy motion blur in source images.
When identity is perfect but motion is stiff, slightly lower the LoRA guide strength in the LTX 2.3 MSR stage and add prompt cues for movement.
For longer stories, increase frames and keep fps constant to preserve timing; for snappier edits, raise fps and shorten frames.
Use a background reference similar in lighting and perspective to your intended scene for fewer inconsistencies.

Acknowledgements#

This workflow implements and builds upon the following works and resources. We gratefully acknowledge the LTX project for the LTX 2.3 MSR (Multi-Subject Reference) workflow for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.

Resources#

LTX/LTX 2.3 MSR Workflow Source
- Docs / Release Notes: RunningHub post

Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.

Want More ComfyUI Workflows?

HunyuanCustom | Multi-Subject Video Generator

Create dual-subject videos with exceptional identity preservation.

HiDream-I1 | T2I

High-quality image generation using a 17B parameter model.

ComfyUI Phantom | Subject to Video

Reference-driven video generation using Wan2.1 14B

Bernini Video Edit Pro | Multimodal Generator

Turns your videos into smart identity-aware masterpieces.

SteadyDancer | Realistic Image-to-Video Generator

Turns portraits into smooth, lifelike motion videos instantly.

Flux Upscaler - Ultimate 32k | Image Upscaler

Flux Upscaler – Achieve 4k, 8k, 16k, and Ultimate 32k Resolution!

IPAdapter Plus (V2) | Style and Composition

IPAdapter Plus enables effective style & composition transfer, functioning like a 1-image LoRA.

Wan2.2 Fun Camera | Cinematic Motion from Images

Turn still images into lively cinematic shots with smooth camera moves.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

LTX 2.3 MSR | Multi-Subject Video Generator