Sonic | Advanced Lip-Sync Portrait Animation Framework

ComfyUI Sonic redefines portrait animation by harnessing global audio perception for ultra-realistic facial movements and expressions. Unlike traditional methods, it captures the full context of speech—beyond phonemes—to generate fluid, emotionally rich animations. With cutting-edge AI technology, Sonic ensures seamless sync between voice and visuals, bringing characters to life with unmatched realism. Elevate your animations with Sonic and make every expression feel truly alive.

The ComfyUI Sonic nodes and related workflow were developed by smthemex. For more information, please visit smthemex's GitHub.

1.1 How to Use Sonic Workflow?

Left nodes are your inputs for Audio and Avatar Image. Middle one is the Sonic Processing Node. Right side is the video combine node for outputting video.

Follow these Steps:

Input your Avatar Image which will be used to visualize the dialogues from the audio.
Input your Audio for generating an audio-driven voice-over of the inserted image.
Click Queue Prompt!!

Done! Your rendered video will be stored in the Outputs folder.

Strengths and Weaknesses of Sonic:

Strengths:

Sonic generates highly realistic and expressive portrait animations driven by audio.
Sonic uses SVD, so there is no flickering between frames.
Consistency is better than previously released audio2video models.

Weaknesses:

As Sonic uses SVD, far or full body shots may struggle with projecting vocals on the face properly.
Side view faces, or faces at complex angles might give distorted results.

1.2 Sonic Audio and Video Input

Upload your Audio in the load audio node (Dialogues or Vocals)
Upload your image in the Load image node (A close-up or medium shot of a person)

1.3 Sonic Processing Node

ComfyUI Sonic uses SVD Model under the hood for processing, so the results and settings are according to the SVD model. These settings are set to optimum; there's no necessity to change them.

Keep min resolution near 768 or under if there are artifacts like morphing or distorted hands.

Sonic transforms portrait animation by focusing on global audio perception for seamless, lifelike expressions. By capturing the full depth of speech, it creates animations that feel natural, emotive, and engaging. Whether for storytelling, virtual avatars, or content creation, Sonic delivers unmatched realism. Step into the future of animation with Sonic—where every word comes to life.

Sonic | Lip-Sync Portrait Animation

1.1 How to Use Sonic Workflow?

Strengths and Weaknesses of Sonic:

1.2 Sonic Audio and Video Input

1.3 Sonic Processing Node

Want More ComfyUI Workflows?

Janus-Pro | T2I + I2T Model

Flux Kontext Character Turnaround Sheet LoRA

Mochi 1 | Genmo Text-to-Video

Qwen-Image | HD Multi-Text Poster Generator

FramePack Wrapper | Efficient long Video Generation

CatVTON | Amazing Virtual Try-On

Flux & 10 In-Context LoRA Models

AnimateDiff + IPAdapter V1 | Image to Video