ComfyUI>Workflows>LatentSync| Lip Sync Model

LatentSync| Lip Sync Model

Workflow Name: RunComfy/LatentSync

Workflow ID: 0000...1182

Updated 6/16/2025: ComfyUI version updated to v0.3.40 for improved stability and compatibility. Additionally, LatentSync has been updated to v1.6. LatentSync redefines lip syncing with audio-conditioned latent diffusion models, bypassing intermediate motion representations for seamless audio-visual alignment. Leveraging Stable Diffusion, it captures intricate correlations while ensuring temporal smoothness. Unlike pixel-based approaches, LatentSync ensures superior temporal consistency with its innovative Temporal REPresentation Alignment (TREPA) module. The TREPA module helps deliver unmatched accuracy and realism.

LatentSync is a state-of-the-art end-to-end lip sync framework that harnesses the power of audio-conditioned latent diffusion models for realistic lip sync generation. What sets LatentSync apart is its ability to directly model the intricate correlations between audio and visual components without relying on any intermediate motion representation, revolutionizing the approach to lip sync synthesis.

At the core of LatentSync's pipeline is the integration of Stable Diffusion, a powerful generative model renowned for its exceptional ability to capture and generate high-quality images. By leveraging Stable Diffusion's capabilities, LatentSync can effectively learn and reproduce the complex dynamics between speech audio and corresponding lip movements, resulting in highly accurate and convincing lip sync animations.

One of the key challenges in diffusion-based lip sync methods is maintaining temporal consistency across generated frames, which is crucial for realistic results. LatentSync tackles this issue head-on with its groundbreaking Temporal REPresentation Alignment (TREPA) module, specifically designed to enhance the temporal coherence of lip sync animations. TREPA employs advanced techniques to extract temporal representations from the generated frames using large-scale self-supervised video models. By aligning these representations with the ground truth frames, LatentSync's framework ensures a high degree of temporal coherence, resulting in remarkably smooth and convincing lip sync animations that closely match the audio input.

1.1 How to Use LatentSync Workflow?

Note: The LatentSync node has been updated to version 1.6 (latest version).

LatentSync

This is the LatentSync workflow, Left Side nodes are inputs for uploading video, Middle is processing LatentSync nodes, and right is the outputs node.

Upload your Video in input nodes.
Upload your Audio input of dialouges.
Click Render !!!

1.2 Video Input

LatentSync

Click and Upload your Reference Video which has face in it.

The video is adjusted to 25 FPS to sync properly with the Audio model

1.3 Audio Input

LatentSync

Click and Upload your audio here.

LatentSync sets a new benchmark for lip sync with its innovative approach to audio-visual generation. By combining precision, temporal consistency, and the power of Stable Diffusion, LatentSync transforms the way we create synchronized content. Redefine what's possible in lip sync with LatentSync.

Want More ComfyUI Workflows?

Hallo2 | Lip-Sync Portrait Animation

Audio-driven lip-sync for portrait animation in 4K.

EchoMimic | Audio-driven Portrait Animations

Generate realistic talking heads and body gestures synced with the provided audio.

Flux Kontext Pulid | Consistent Character Generation

Create consistent characters using FLUX Kontext with a single face reference image.

Flux Fill | Inpaint and Outpaint

Official Flux Tools - Flux Fill for Inpainting and Outpainting

CogvideoX Fun | Video-to-Video Model

CogVideoX Fun: Advanced video-to-video model for high-quality video generation.

CogVideoX-5B | Advanced Text-to-Video Model

CogVideoX-5B: Advanced text-to-video model for high-quality video generation.

BRIA AI RMBG 1.4 vs Segment Anything | Background Removal

Efficiently removes backgrounds by comparing BRIA AI's RMBG 1.4 with Segment Anything.

Outpainting | Expand Image

Easily extend images using outpainting node and ControlNet inpainting model.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.