logo
RunComfy
  • Models
  • ComfyUI
  • TrainerNew
  • API
  • Pricing
discord logo
MODELS
Explore
All Models
LIBRARY
Generations
MODEL APIS
API Docs
API Keys
ACCOUNT
Usage

Lipsync Video: Emotion-Aware Face Reanimation from Audio on playground and API | RunComfy

sync/react-1

Sync React-1 reanimates facial performance from audio or emotion prompts with identity-preserving lipsync video-to-video and audio-to-video edits up to 4K for filmmakers, localization, and post-production teams.

URL to the input video. Must be 15 seconds or shorter.
0:00
0:00
URL to the input audio. Must be 15 seconds or shorter.
Emotion prompt for the generation. Currently supports single-word emotions only.
Controls the edit region and movement scope for the model.
Lipsync mode when audio and video durations are out of sync.
Controls the expressiveness of the lipsync.
Idle
The rate is $0.16 per second.

Introduction to Lipsync Video Creation

Sync Labs' Sync Lipsync React-1 reanimates full facial performance in existing footage from audio and emotion prompts at $0.16 per second, delivering identity-preserving video-to-video and audio-to-video edits up to 4K. Trading reshoots, manual frame-by-frame cleanup, and basic lip-dub passes for controllable emotion-aware performance editing, it eliminates complex masking and ADR overhead for filmmakers, post-production and localization teams, and marketers producing lipsync video. For developers, lipsync video on RunComfy can be used both in the browser and via an HTTP API, so you don’t need to host or scale the model yourself.
Ideal for: Emotion-Faithful Dubbing Localization | Post-Production Performance Fixes | Multi-Emotion Cut Creation

Examples of Lipsync Video Outputs

Video thumbnail
Loading...
Video thumbnail
Loading...
Video thumbnail
Loading...
Video thumbnail
Loading...
Video thumbnail
Loading...
Video thumbnail
Loading...

Model Overview


  • Provider: Sync Labs (sync.)
  • Task: video-to-video,audio-to-video
  • Max Resolution/Duration: Up to 4K, inputs ≤ 15s
  • Summary: React-1 reanimates full facial performance from new audio and emotion prompts while preserving identity and style, producing high-fidelity lipsync video edits. It controls mouth articulation, micro-expressions, head and eye motion, and timing for emotionally aligned dubs and performance fixes. Ideal for teams who need consistent identity and motion continuity at high resolution.

Key Capabilities


Emotion-driven performance reanimation

  • Edits beyond the mouth region to regenerate expressions, eye gaze, head motion, and timing from six emotion presets (happy, angry, sad, neutral, disgusted, surprised).
  • Produces natural, emotionally coherent lipsync video that matches delivery and feel, not just phoneme accuracy.

Identity, style, and timing preservation

  • Maintains speaker identity, speaking rhythm, accent, and stylistic traits across frames and cuts.
  • Ensures the resulting lipsync video aligns with the original take’s look and cadence while fixing or reinterpreting the performance.

Zero-shot, cross-format edit fidelity

  • Works on live-action, animation, and AI-generated footage without per-actor fine-tuning.
  • Provides controllable edit scope (lips/face/head) for targeted or full-face lipsync video updates at up to 4K.

Input Parameters


Core Inputs


ParameterTypeDefault/RangeDescription
video_urlstring (video_uri)RequiredPublicly accessible URL to source footage; must be 15s or shorter.
audio_urlstring (audio_uri)RequiredPublicly accessible URL to the driving audio; must be 15s or shorter.
emotionstring (choice)neutralOne of: happy, angry, sad, neutral, disgusted, surprised. Drives the emotional performance.

Motion & Sync Controls


ParameterTypeDefault/RangeDescription
model_modestring (choice)faceEdit scope control: lips (mouth-only), face (full-face), head (includes head motion).
lipsync_modestring (choice)bounceHandling when audio/video durations differ: cut_off, loop, bounce, silence, remap.
temperaturefloat0.5Controls expressiveness; higher values yield more animated delivery.

How lipsync video compares to other models


  • Vs Lipsync-2-Pro: Compared to Lipsync-2-Pro, lipsync video delivers full facial reanimation (expressions, eye/head motion, timing) rather than primarily mouth alignment, while preserving 4K identity fidelity. Key improvements include emotion control via presets and more nuanced performance rewriting. Ideal Use Case: choose React-1 when you need emotional direction and holistic performance editing.
  • Vs general video generators (Seedance, Wan, Kling): Compared to text-to-video/image-to-video systems, lipsync video delivers precise edits to existing footage with identity preservation and emotional fidelity instead of synthesizing new scenes. Key improvements include controllable performance, timing adherence, and dub alignment. Ideal Use Case: choose React-1 for post-production re-acting, localization, and performance fixes on real footage.
  • Vs Lipsync-2: Compared to Lipsync-2, lipsync video delivers emotion-guided performance, micro-expressions, and controllable head/eye movement, not just lip motion in sync. Key improvements include performance realism, timing nuance, and cross-format consistency. Ideal Use Case: choose React-1 when you need expressive, emotionally consistent re-animation.

API Integration


Developers can integrate lipsync video via the RunComfy API using standard HTTP requests for streamlined ingestion of source video, audio, and control parameters.


Note: API Endpoint for lipsync video


Official resources and licensing


  • Official Website/Paper: https://sync.so/react-1- License: Proprietary research preview by Sync Labs. Commercial use may require a separate agreement with the provider.

Explore Related Capabilities


If you only need straightforward lip alignment without emotion-guided re-animation, consider Sync Lipsync-2-Pro (audio-to-video), which emphasizes accurate mouth shapes and high-fidelity detail without altering broader facial performance. For generating entirely new scenes instead of editing existing footage, use a text-to-video model designed for content creation rather than performance editing. Explore More Lipsync Playgrounds Here

Related Playgrounds

hailuo-2-3/fast/standard/image-to-video

Turn static visuals into smooth motion with Hailuo 2.3 for rapid, realistic video creation.

aurora

Transform still images and voice tracks into lifelike talking avatars with precise motion control.

ai-avatar/v2/standard

Convert photos into expressive talking avatars with precise motion and HD detail

kling-2-6/pro/image-to-video

Turns static visuals into cinematic motion with synced audio and natural camera flow

kling-2-6/motion-control-pro

Cinematic motion model for fluid scene creation and adaptive visual editing.

wan-2-2/animate/video-to-video

Transforms input clips into synced animated characters with precise motion replication.

Frequently Asked Questions

What is React-1, and how does it improve the quality of a lipsync video compared to earlier Lipsync models?

React-1 transforms a lipsync video through advanced video-to-video and audio-to-video processing. It not only syncs lip motion but also regenerates full facial performances—including micro-expressions, head, and eye movements—based on emotional cues. Compared to previous Lipsync-2 or Lipsync-2-Pro models, it excels in emotional realism and identity preservation.

Can I edit existing footage using React-1 to make a more expressive lipsync video?

Yes. React-1 is specifically designed for editing existing footage via video-to-video reanimation. You can upload a source lipsync video along with a separate audio track (audio-to-video input), select one of the six emotional presets, and the model regenerates facial performance while maintaining the original identity and scene style.

What are the technical limitations when generating a lipsync video with React-1?

The current research preview of React-1 supports output up to 4K resolution but works most efficiently at 1080p. Maximum aspect ratio is 16:9, and it typically accepts one main video-to-video and one audio-to-video input per generation. Additional control sources such as ControlNet or IP-Adapter are not yet supported.

Are there token or parameter size limits that might affect generating long lipsync video sequences?

Yes. When producing a long lipsync video via the React-1 API or RunComfy playground, you may encounter memory limits beyond approximately 90–120 seconds of footage, depending on resolution. Exceeding this duration may result in partial frame drops during audio-to-video alignment.

How do developers move from testing a lipsync video in the RunComfy Playground to deploying it in production via API?

After refining your lipsync video workflow in the RunComfy Playground, you can export configuration parameters (emotion, audio, video-to-video reference) and replicate them using RunComfy’s public API. The API mirrors the playground’s model settings and supports scripted generation for automated audio-to-video pipelines. Consult API docs or contact hi@runcomfy.com for integration keys.

How does React-1 maintain emotional fidelity in a lipsync video?

React-1 employs fine-grained diffusion-based video-to-video transformation, aligning frame-level facial geometry with the given emotional prompt. During audio-to-video synthesis, emotional consistency in lip tension, eye focus, and head motion is maintained, ensuring realism in the reanimated performance.

In what scenarios does React-1 produce the most realistic lipsync video results?

React-1 performs best in dialogue-driven scenes and front-facing footage where facial details are visible. This enables precise audio-to-video mapping and expressive reanimation. It may struggle in cases with heavy occlusion, extreme lighting, or extreme profile angles affecting the video-to-video pipeline.

What differentiates React-1 from general video generation models when producing a lipsync video?

While most models generate entirely new content, React-1 enhances existing lipsync video performances through targeted video-to-video emotion editing. It focuses on identity retention, accurate timing, and emotional fidelity, whereas other generators prioritize scene diversity or stylization.

Can I use React-1 lipsync video outputs commercially?

Commercial usage of any Lipsync or React model, including lipsync video outputs generated via video-to-video or audio-to-video processing, depends on Sync Labs’ official license terms. Always verify usage and attribution requirements on sync.so before deploying results in paid or public productions.

What is the difference between a typical lipsync video model and React-1’s emotional reanimation approach?

Traditional lipsync video models synchronize mouth movements via audio-to-video alignment only. React-1, however, applies emotion-driven video-to-video transformation, modifying subtle facial aspects such as brows and gaze to reflect emotional tone, offering a richer and more natural performance.

Follow us
  • LinkedIn
  • Facebook
  • Instagram
  • Twitter
Support
  • Discord
  • Email
  • System Status
  • Affiliate
Video Models/Tools
  • Wan 2.6
  • Wan 2.6 Text to Video
  • Veo 3.1 Fast Video Extend
  • Seedance Lite
  • Wan 2.2
  • Seedance 1.0 Pro Fast
  • View All Models →
Image Models
  • GPT Image 1.5 Image to Image
  • Flux 2 Max Edit
  • GPT Image 1.5 Text To Image
  • Gemini 3 Pro
  • seedream 4.0
  • Nano Banana Pro
  • View All Models →
Legal
  • Terms of Service
  • Privacy Policy
  • Cookie Policy
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.