LTX 2.3 Edit Anything in ComfyUI | Prompt-Based Video Editing

ComfyUI LTX 2.3 Edit Anything Workflow

Want to run this workflow?

Fully operational workflows
No missing nodes or models
No manual setups required
Features stunning visuals

ComfyUI LTX 2.3 Edit Anything Examples

LTX 2.3 Edit Anything: prompt‑driven video‑to‑video editing workflow for ComfyUI#

This workflow turns a plain‑English edit request into a temporally coherent video edit using LTX‑2.3 with the LTX 2.3 Edit Anything LoRA. Instead of regenerating a scene, it anchors on your input clip and applies localized or global changes while preserving motion, identities, and timing. Typical uses include object insertion or removal, background cleanup, targeted replacements, and creative restyling.

The graph bundles prompt normalization, guide‑frame conditioning, and a one‑pass generation path followed by optional frame interpolation and anti‑aliasing. You can export the edited clip and a side‑by‑side comparison with the source. LTX 2.3 Edit Anything is the center of this workflow: it provides broad, promptable edit control while keeping LTX‑2.3’s high‑fidelity look.

Key models in Comfyui LTX 2.3 Edit Anything workflow#

LTX‑2.3 base video diffusion transformer by Lightricks. Core video generation backbone that predicts temporally consistent frames from text and guides. Model card • Repo
LTX 2.3 Edit Anything LoRA. Edit‑specialized LoRA enabling add/remove/replace/style operations without losing scene structure. Model
LTX‑2.3 distilled LoRA 384. Distillation that enables shorter sampling schedules while preserving quality; useful when you want faster edits. Model
Gemma 3 12B Instruct text encoder + LTX‑2.3 text projection. Encodes the normalized caption into conditioning embeddings for LTX‑2.3. Files
LTX‑2.3 Video VAE and Audio VAE. Compress and decode video and audio latents used throughout the pipeline. Files
LTX‑2.3 Spatial and Temporal Upscalers. Optional latent upscalers for sharper frames and steadier motion when you aim beyond the first‑pass resolution. Spatial x2 • Temporal x2
RIFE (Real‑Time Intermediate Flow Estimation). Frame‑interpolation model that doubles playback FPS for smoother motion in the final export. Repo • ComfyUI extension

How to use Comfyui LTX 2.3 Edit Anything workflow#

At a high level, you load a video, describe the edit, and run generation. The workflow normalizes your request into a training‑style caption, conditions LTX‑2.3 with guide frames from the source clip, and samples an edited result. Optional post‑processing interpolates frames and applies adaptive anti‑aliasing before export.

Video Settings#

Use this group to define clip timing and output size. Set FPS and Duration (Seconds) to match your goal; the graph computes a frame count aligned for stable sampling. Choose Resolution (Longer dimension) for your target long side, then optionally set Downscale Video Factor if you want a faster, smaller first pass. If you plan to keep a single pass, prefer a factor of 1.0 for native‑size output.

Inputs#

Load your source clip in VHS_LoadVideo and let the workflow handle resizing. Frames are resized to the chosen long side and optionally downscaled for speed, then passed through LTXVPreprocess to prepare for LTX‑2.3. The same input is stored as a “control video” that later anchors motion and content so the edit follows the original scene.

Prompt#

Enter your request in Describe the task here (Prompt). A built‑in TextGenerate step rewrites it into a single, dataset‑style caption like “Remove the small red car in the background.” The normalized caption is displayed in “Final Prompt” and then encoded for conditioning. You can also type an exact caption in the “Manual Prompt” encoder if you prefer full control.

Model#

The loader initializes the LTX‑2.3 backbone and attaches LoRAs. Use the base model for general fidelity and add the LTX 2.3 Edit Anything LoRA for editability. Optionally include the distilled LoRA if you want shorter schedules while keeping coherence. Video and audio VAEs are prepared here for latent encode/decode.

Generate Low Resolution#

The workflow turns your caption into positive/negative conditioning and sets the video frame rate so temporal guidance matches your target. LTXVAddGuideMulti injects guide information from the control video, which helps preserve identities, layout, and motion as the edit is applied. A custom sampler then denoises from guided noise toward an edited AV latent, balancing prompt adherence with structure preservation. After sampling, the video latent is decoded to produce the first‑pass edited frames.

Empty Latent#

This path prepares audio/video latents used by the sampler. By default an empty audio latent is concatenated so you can render even when you do not edit audio. To localize edits, SolidMask together with SetLatentNoiseMask can restrict where new noise is injected, which is useful for replacing a single object without touching the rest of the scene.

Audio#

If your source clip includes audio, it can be passed through unchanged; otherwise the graph creates a silent track for reliable export. You can also load or record custom audio and trim it to match your duration. For edits focused purely on visuals, you can remove audio from the final combine steps.

1 Pass Result#

This area previews the edited frames and assembles a side‑by‑side “before vs after” comparison using the control video. It is ideal for quickly checking whether the LTX 2.3 Edit Anything prompt targeted the right region, preserved motion, and respected the scene’s composition. You can export this comparison as a quick shareable artifact.

Post‑Processing#

If you want smoother motion, the RIFE VFI stage interpolates between frames to double the FPS. VideoAdaptiveAA then applies lightweight anti‑aliasing to clean up edges before final encoding. The exporter writes the result at twice the original FPS so the motion feels natural without increasing the initial sampling cost.

Key nodes in Comfyui LTX 2.3 Edit Anything workflow#

`TextGenerate` (#178)#

Converts informal requests into a single training‑style caption that LTX‑2.3 understands well, improving edit precision and temporal stability. Use it when you want consistent phrasing across projects; if you need exact wording, enter it directly into “Manual Prompt.” Reference: LTX‑2.3 prompt handling in the official repo provides the broader context for conditioning behavior. Docs

`LTXVConditioning` (#51)#

Packages positive and negative conditioning with the intended frame rate so temporal tokens align to your clip. Keep the frame_rate consistent with your export to avoid drift; this helps LTX 2.3 Edit Anything preserve motion while applying the change. You rarely need heavy negatives; concise negatives can suffice to suppress unwanted artifacts.

`LTXVAddGuideMulti` (#104)#

Attaches one or more guide frames from the control video to the latent so the edit tracks original structure and timing. Changing which frame you guide with can affect identity preservation and pose consistency. For localized edits, pair this with a mask so only the target region receives meaningful noise.

`SetLatentNoiseMask` (#75)#

Defines where the sampler is allowed to add or keep noise, effectively controlling edit regions. A full‑white mask edits the whole frame; soft masks are ideal to blend replacements into busy backgrounds. Replace SolidMask with a painted mask when you need precise spatial control.

`SamplerCustomAdvanced` (#38)#

Drives the denoising process using your chosen sampler and schedule. Shorter schedules are faster but benefit from the distilled LoRA; longer ones can increase adherence at the cost of time. If you want a different look or stability profile, try alternative samplers while keeping the same guide setup. Reference: ComfyUI sampler docs explain how sampler and sigma schedules interact. ComfyUI

`RIFE VFI` (#205)#

Interpolates intermediate frames to increase smoothness without resampling the diffusion model. It is a post step that preserves content while improving motion cadence. Reference: RIFE model and ComfyUI integration. Model • Extension

Optional extras#

Local edits first: Use a painted mask with SetLatentNoiseMask to tightly constrain where LTX 2.3 Edit Anything applies changes, then widen the mask if edges look too sharp.
Faster iteration: Lower Downscale Video Factor for quick proofs, then return to 1.0 for the final render or add the spatial/temporal upscalers for extra sharpness.
Audio‑free sources: If the input has no audio, disable audio in the final combine to avoid muxing errors, or supply a silent track via the provided nodes.
Scheduling note: The “bong_tangent” schedule shown in the graph requires the RES4LYF node pack; if you select it, install the extension first. Repo
Comparisons: Use the built‑in side‑by‑side export to verify that identities, lighting, and camera motion are preserved before committing to long renders.

This ComfyUI template pairs LTX‑2.3’s high‑fidelity backbone with the LTX 2.3 Edit Anything LoRA so you can add, remove, replace, or restyle elements in a clip while keeping the scene’s rhythm intact.

Acknowledgements#

This workflow implements and builds upon the following works and resources. We gratefully acknowledge LTX for LTX 2.3 Edit Anything Workflow Source for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.

Resources#

LTX/LTX 2.3 Edit Anything Workflow Source
- Docs / Release Notes @Benji’s AI Playground: LTX 2.3 Edit Anything Workflow Source

Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.

Want More ComfyUI Workflows?

DiffuEraser | Video Inpainting

Erase objects from videos with auto-masking and realistic reconstruction.

Mochi Edit UnSampling | Video-to-Video

Mochi Edit: Modify Videos Using Text-Based Prompts and Unsampling.

Wan 2.1 Ditto | Cinematic Video Restyle Generator

Transform videos into stunning artistic styles with perfect motion flow.

VACE 14B: All-in-One Video Creation & Editing

Create, edit and transform videos with the powerful VACE Wan2.1 14B.

OmniGen2 | Text-to-Image & Editing

Powerful unified model for image generation and editing

SDXL Turbo | Rapid Text to Image

Experience fast text-to-image synthesis with SDXL Turbo.

Pose Control LipSync S2V | Expressive Video Generator

Turn images into talking, moving characters with pose and audio control.

Wan2.2 Fun Inp | Cinematic Video Generator

From 2 images to stunning videos with smooth, controllable transitions.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

LTX 2.3 Edit Anything | Smart Video Editor