RunComfy

Qwen Image 2512 | Precision AI Image Generator

Ultra-detailed art creation with next-level visual accuracy and control.

ComfyUI F5 TTS | Natural Voice Cloning Engine

Turn text into rich, expressive voices with natural tone control.

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

FLUX.2 [klein] 4B & 9B | Ultra-Fast Flux Image Generator

Blazing-fast visual creation with unified editing control.

ComfyUI > Nodes > TTS Audio Suite > 🗣️ Silent Speech Analyzer

ComfyUI Node: 🗣️ Silent Speech Analyzer

Class Name

MouthMovementAnalyzer

Category
TTS Audio Suite/🎬 Video Analysis

Author
diogod (Account age: 667days) Extension
TTS Audio Suite Latest Updated
2025-12-13 Github Stars
0.46K

Github Ask diogod Current Questions Past Questions

Table of Content

Description
MouthMovementAnalyzer:
MouthMovementAnalyzer Input Parameters:
MouthMovementAnalyzer Output Parameters:
MouthMovementAnalyzer Usage Tips:
MouthMovementAnalyzer Common Errors and Solutions:
Related Nodes

How to Install TTS Audio Suite

Install this extension via the ComfyUI Manager by searching for TTS Audio Suite

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter TTS Audio Suite in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

🗣️ Silent Speech Analyzer Description

Specialized node for analyzing video content to detect and time mouth movements for text-to-speech synchronization.

🗣️ Silent Speech Analyzer:

The MouthMovementAnalyzer is a specialized node designed to analyze video content for detecting mouth movement timing, which is crucial for synchronizing text-to-speech (TTS) applications. This node leverages advanced computer vision techniques to identify and time mouth movements accurately, ensuring that the audio output aligns perfectly with the visual cues in a video. It supports multiple computer vision providers, offering flexibility and adaptability to different hardware and software environments. By providing detailed timing data, the MouthMovementAnalyzer enhances the realism and effectiveness of TTS systems, making it an invaluable tool for AI artists and developers working on projects that require precise audio-visual synchronization.

🗣️ Silent Speech Analyzer Input Parameters:

video

The video parameter is the input video file that the node will analyze to detect mouth movements. This parameter is essential as it provides the visual data necessary for the analysis process. The video should be in a compatible format and of sufficient quality to ensure accurate detection of mouth movements.

provider

The provider parameter specifies the computer vision provider used for mouth movement detection. It offers several options, including MediaPipe, OpenSeeFace, and dlib. MediaPipe is preferred for its speed and accuracy, although it is incompatible with Python 3.13. OpenSeeFace is an alternative for newer Python versions but may be less accurate. dlib is a lightweight option that does not rely on machine learning dependencies and is expected to be available soon. The default provider is set based on the user's environment, and the choice of provider can significantly impact the performance and accuracy of the analysis.

sensitivity

The sensitivity parameter controls the detection sensitivity of mouth movements, with a range from 0.05 to 1.0. This parameter uses exponential scaling to provide fine control, especially at higher values. Lower values (0.05-0.2) detect only obvious movements, while higher values (0.9-1.0) capture ultra-sensitive movements, including whispers and micro-movements. The default value is 1.0, and users are encouraged to start with 0.5 and fine-tune in 0.01 increments to achieve the desired balance between sensitivity and accuracy.

🗣️ Silent Speech Analyzer Output Parameters:

timing_data

The timing_data output provides detailed information about the timing of detected mouth movements within the video. This data is crucial for synchronizing TTS audio with the visual content, ensuring that speech aligns with the speaker's mouth movements.

movement_frames

The movement_frames output lists the specific frames in the video where mouth movements were detected. This information helps in pinpointing the exact moments of speech, allowing for precise editing and synchronization tasks.

confidence_scores

The confidence_scores output offers a measure of the confidence level for each detected mouth movement. These scores help users assess the reliability of the detection results and make informed decisions about further processing or adjustments.

preview_path

The preview_path output provides a path to a preview of the analyzed video, highlighting the detected mouth movements. This visual representation aids in verifying the accuracy of the analysis and making any necessary adjustments.

🗣️ Silent Speech Analyzer Usage Tips:

Start with the default sensitivity setting of 0.5 and adjust in small increments to find the optimal balance for your specific video content.
Choose the computer vision provider based on your system's compatibility and the desired accuracy level. MediaPipe is generally recommended for most use cases.
Use the preview output to visually verify the accuracy of the detected mouth movements before finalizing your TTS synchronization.

🗣️ Silent Speech Analyzer Common Errors and Solutions:

Unsupported video format

Explanation: The video file provided is not in a supported format, leading to analysis failure.
Solution: Convert the video to a compatible format, such as MP4 or AVI, and try again.

Provider compatibility issue

Explanation: The selected computer vision provider is not compatible with the current Python version or system configuration.
Solution: Switch to a compatible provider, such as OpenSeeFace for Python 3.13, or adjust your Python environment to support the preferred provider.

Low confidence scores

Explanation: The analysis results have low confidence, indicating potential inaccuracies in mouth movement detection.
Solution: Increase the sensitivity setting or improve the video quality to enhance detection accuracy.

🗣️ Silent Speech Analyzer Related Nodes

Go back to the extension to check out more related nodes.

TTS Audio Suite

Table of Content

Description
MouthMovementAnalyzer:
MouthMovementAnalyzer Input Parameters:
MouthMovementAnalyzer Output Parameters:
MouthMovementAnalyzer Usage Tips:
MouthMovementAnalyzer Common Errors and Solutions:
Related Nodes

Consistent Character Creator

Create consistent, high-resolution character designs from multiple angles with full control over emotions, lighting, and environments.

Hunyuan Video 1.5 | Fast AI Video Generator

Turn text or images into smooth 1080p videos quickly and easily.

Flux Kontext Zoom Out ComfyUI Workflow | Seamless Outpainting

Zoom Out LoRA enlarges images seamlessly with natural continuation.

Qwen Edit 2509 Light Restoration | Photo Relight Tool

Fix bad lighting fast for perfect, clean, balanced photos every time.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy