RunComfy

Flux Klein Face Swap | Realistic AI Face Editor

Swap faces perfectly. Natural, lifelike, and fast AI-powered editing.

Flux Krea Dev | Natural Text to Image

The best open-source FLUX model! Absolutely incredible natural results.

Wan 2.2 + Lightx2v V2 | Ultra Fast I2V & T2V

Dual Light LoRA setup, 4X faster.

Create Coherent Scenes | Consistent Story Art Generator

Build seamless storytelling scenes with rich visual consistency.

ComfyUI > Nodes > TTS Audio Suite > 🔧 Viseme Mouth Shape Options

ComfyUI Node: 🔧 Viseme Mouth Shape Options

Class Name

VisemeDetectionOptionsNode

Category
TTS Audio Suite/🎬 Video Analysis

Author
diogod (Account age: 667days) Extension
TTS Audio Suite Latest Updated
2025-12-13 Github Stars
0.46K

Github Ask diogod Current Questions Past Questions

Table of Content

Description
VisemeDetectionOptionsNode:
VisemeDetectionOptionsNode Input Parameters:
VisemeDetectionOptionsNode Output Parameters:
VisemeDetectionOptionsNode Usage Tips:
VisemeDetectionOptionsNode Common Errors and Solutions:
Related Nodes

How to Install TTS Audio Suite

Install this extension via the ComfyUI Manager by searching for TTS Audio Suite

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter TTS Audio Suite in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

🔧 Viseme Mouth Shape Options Description

Specialized node for advanced viseme detection, enhancing lip-sync accuracy in TTS and video analysis applications.

🔧 Viseme Mouth Shape Options:

The VisemeDetectionOptionsNode is a specialized configuration node designed to enhance the analysis of mouth movements by providing advanced viseme detection settings. This node is integral for applications requiring precise lip-sync capabilities, such as text-to-speech (TTS) systems and video analysis. By enabling vowel classification, it goes beyond simple mouth open/close detection to analyze the geometric patterns of mouth shapes, allowing for the identification of vowel sounds like A, E, I, O, and U. This results in more accurate phoneme sequences, which are crucial for synchronizing audio with visual elements. The node also offers options for consonant detection and temporal analysis, further refining the accuracy of mouth movement analysis. Overall, the VisemeDetectionOptionsNode is a powerful tool for artists and developers looking to achieve high fidelity in audio-visual synchronization.

🔧 Viseme Mouth Shape Options Input Parameters:

enable_viseme_detection

This parameter is a boolean that enables or disables the viseme detection feature. When set to True, it activates vowel classification, which analyzes mouth shape geometry to detect vowel patterns, adding approximately 20% more processing time. This feature is essential for precise lip-sync and provides phoneme sequences for better TTS synchronization. The default value is True.

viseme_sensitivity

This float parameter controls the sensitivity of the viseme detection process, affecting how rigorously the system searches for vowel shapes. It ranges from 0.1 to 2.0, with a default value of 2.0. Lower values (0.1-0.5) result in very strict detection, identifying only obvious vowel shapes, while higher values (1.5-2.0) are more lenient, detecting subtle variations but potentially increasing false positives. A balanced setting (0.8-1.2) is recommended for most applications.

viseme_confidence_threshold

This float parameter sets the confidence threshold for viseme detection, determining the minimum confidence level required for a detection to be considered valid. A higher threshold results in fewer detections but increases accuracy, while a lower threshold allows more detections, potentially including false positives. The default value is 0.04.

viseme_smoothing

This float parameter controls the smoothing of viseme detection results, affecting the stability and consistency of the detected viseme sequences. Smoothing helps reduce jitter in the detection output, providing a more coherent sequence of mouth shapes. The default value is 0.3.

enable_consonant_detection

This boolean parameter enables the detection of consonant sounds, complementing the vowel detection process. When enabled, it automatically activates temporal analysis to improve accuracy. This feature is crucial for capturing the full range of mouth movements associated with speech. The default value is False.

enable_temporal_analysis

This boolean parameter, when enabled, allows the system to consider the temporal aspect of mouth movements, enhancing the accuracy of both vowel and consonant detection. Temporal analysis is particularly useful when consonant detection is enabled, as it provides a more comprehensive understanding of speech dynamics. The default value is False.

enable_word_prediction

This boolean parameter enables word prediction capabilities, which can enhance the accuracy of viseme detection by providing contextual information about expected mouth movements. This feature is beneficial for applications where predicting the next word can improve synchronization. The default value is False.

🔧 Viseme Mouth Shape Options Output Parameters:

viseme_options

The output is a dictionary containing all the configured viseme detection settings. This dictionary includes the status of viseme detection, sensitivity, confidence threshold, smoothing, and the enabling of consonant detection, temporal analysis, and word prediction. This output is crucial for passing the configured settings to other nodes or systems that perform mouth movement analysis, ensuring that the analysis is conducted with the desired parameters.

🔧 Viseme Mouth Shape Options Usage Tips:

For optimal lip-sync accuracy, enable both viseme detection and consonant detection, as this combination provides a comprehensive analysis of mouth movements.
Adjust the viseme sensitivity based on the specific requirements of your project. A balanced setting (0.8-1.2) is generally recommended, but you can increase sensitivity for more detailed detection or decrease it for higher accuracy.
Utilize the word prediction feature if your application involves predictable speech patterns, as it can enhance the synchronization of audio and visual elements.

🔧 Viseme Mouth Shape Options Common Errors and Solutions:

"Auto-enabled temporal analysis for consonant detection"

Explanation: This message indicates that temporal analysis was automatically enabled because consonant detection was activated. Temporal analysis is necessary for accurate consonant detection.
Solution: No action is required as this is an informational message. However, if you do not want temporal analysis, you should disable consonant detection.

"Viseme options created: enabled=False, consonants=False, temporal=False, words=False, sensitivity=1.0, confidence=0.04, smoothing=0.3"

Explanation: This log message shows the current configuration of viseme options, indicating that all features are disabled and default values are used.
Solution: Review and adjust the input parameters to enable the desired features and set appropriate values for your specific use case.

🔧 Viseme Mouth Shape Options Related Nodes

Go back to the extension to check out more related nodes.

TTS Audio Suite

Table of Content

Description
VisemeDetectionOptionsNode:
VisemeDetectionOptionsNode Input Parameters:
VisemeDetectionOptionsNode Output Parameters:
VisemeDetectionOptionsNode Usage Tips:
VisemeDetectionOptionsNode Common Errors and Solutions:
Related Nodes

Wan 2.2 | Open-Source Video Gen Leader

Available now! Better precision + smoother motion.

SCAIL Model | Pose-Guided Animation Maker

Pose-driven animation with identity stability and motion precision.

Wan2.2 Fun Camera | Cinematic Motion from Images

Turn still images into lively cinematic shots with smooth camera moves.

Flux Kontext Pulid | Consistent Character Generation

Create consistent characters using FLUX Kontext with a single face reference image.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: 🔧 Viseme Mouth Shape Options

VisemeDetectionOptionsNode

How to Install TTS Audio Suite

🔧 Viseme Mouth Shape Options Description

🔧 Viseme Mouth Shape Options:

🔧 Viseme Mouth Shape Options Input Parameters:

enable_viseme_detection

viseme_sensitivity

viseme_confidence_threshold

viseme_smoothing

enable_consonant_detection

enable_temporal_analysis

enable_word_prediction

🔧 Viseme Mouth Shape Options Output Parameters:

viseme_options

🔧 Viseme Mouth Shape Options Usage Tips:

🔧 Viseme Mouth Shape Options Common Errors and Solutions:

"Auto-enabled temporal analysis for consonant detection"

"Viseme options created: enabled=False, consonants=False, temporal=False, words=False, sensitivity=1.0, confidence=0.04, smoothing=0.3"

🔧 Viseme Mouth Shape Options Related Nodes