Turn static visuals into smooth motion with Hailuo 2.3 for rapid, realistic video creation.






| Parameter | Type | Default/Range | Description |
|---|---|---|---|
| video_url | string (video_uri) | Required | Publicly accessible URL to source footage; must be 15s or shorter. |
| audio_url | string (audio_uri) | Required | Publicly accessible URL to the driving audio; must be 15s or shorter. |
| emotion | string (choice) | neutral | One of: happy, angry, sad, neutral, disgusted, surprised. Drives the emotional performance. |
| Parameter | Type | Default/Range | Description |
|---|---|---|---|
| model_mode | string (choice) | face | Edit scope control: lips (mouth-only), face (full-face), head (includes head motion). |
| lipsync_mode | string (choice) | bounce | Handling when audio/video durations differ: cut_off, loop, bounce, silence, remap. |
| temperature | float | 0.5 | Controls expressiveness; higher values yield more animated delivery. |
Developers can integrate lipsync video via the RunComfy API using standard HTTP requests for streamlined ingestion of source video, audio, and control parameters.
Note: API Endpoint for lipsync video
If you only need straightforward lip alignment without emotion-guided re-animation, consider Sync Lipsync-2-Pro (audio-to-video), which emphasizes accurate mouth shapes and high-fidelity detail without altering broader facial performance. For generating entirely new scenes instead of editing existing footage, use a text-to-video model designed for content creation rather than performance editing. Explore More Lipsync Playgrounds Here
Turn static visuals into smooth motion with Hailuo 2.3 for rapid, realistic video creation.
Transform still images and voice tracks into lifelike talking avatars with precise motion control.
Convert photos into expressive talking avatars with precise motion and HD detail
Turns static visuals into cinematic motion with synced audio and natural camera flow
Cinematic motion model for fluid scene creation and adaptive visual editing.
Transforms input clips into synced animated characters with precise motion replication.
React-1 transforms a lipsync video through advanced video-to-video and audio-to-video processing. It not only syncs lip motion but also regenerates full facial performances—including micro-expressions, head, and eye movements—based on emotional cues. Compared to previous Lipsync-2 or Lipsync-2-Pro models, it excels in emotional realism and identity preservation.
Yes. React-1 is specifically designed for editing existing footage via video-to-video reanimation. You can upload a source lipsync video along with a separate audio track (audio-to-video input), select one of the six emotional presets, and the model regenerates facial performance while maintaining the original identity and scene style.
The current research preview of React-1 supports output up to 4K resolution but works most efficiently at 1080p. Maximum aspect ratio is 16:9, and it typically accepts one main video-to-video and one audio-to-video input per generation. Additional control sources such as ControlNet or IP-Adapter are not yet supported.
Yes. When producing a long lipsync video via the React-1 API or RunComfy playground, you may encounter memory limits beyond approximately 90–120 seconds of footage, depending on resolution. Exceeding this duration may result in partial frame drops during audio-to-video alignment.
After refining your lipsync video workflow in the RunComfy Playground, you can export configuration parameters (emotion, audio, video-to-video reference) and replicate them using RunComfy’s public API. The API mirrors the playground’s model settings and supports scripted generation for automated audio-to-video pipelines. Consult API docs or contact hi@runcomfy.com for integration keys.
React-1 employs fine-grained diffusion-based video-to-video transformation, aligning frame-level facial geometry with the given emotional prompt. During audio-to-video synthesis, emotional consistency in lip tension, eye focus, and head motion is maintained, ensuring realism in the reanimated performance.
React-1 performs best in dialogue-driven scenes and front-facing footage where facial details are visible. This enables precise audio-to-video mapping and expressive reanimation. It may struggle in cases with heavy occlusion, extreme lighting, or extreme profile angles affecting the video-to-video pipeline.
While most models generate entirely new content, React-1 enhances existing lipsync video performances through targeted video-to-video emotion editing. It focuses on identity retention, accurate timing, and emotional fidelity, whereas other generators prioritize scene diversity or stylization.
Commercial usage of any Lipsync or React model, including lipsync video outputs generated via video-to-video or audio-to-video processing, depends on Sync Labs’ official license terms. Always verify usage and attribution requirements on sync.so before deploying results in paid or public productions.
Traditional lipsync video models synchronize mouth movements via audio-to-video alignment only. React-1, however, applies emotion-driven video-to-video transformation, modifying subtle facial aspects such as brows and gaze to reflect emotional tone, offering a richer and more natural performance.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.