RunComfy

Flux Klein Face Swap | Realistic AI Face Editor

Swap faces perfectly. Natural, lifelike, and fast AI-powered editing.

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

Wan 2.1 Video Restyle | Consistent Video Style Transform

Transform your video style by applying the restyled first frame using Wan 2.1 video restyle workflow.

FLUX Kontext Face Swap | Seamless Face Replacement

Photoreal face replacement with prompt-guided control and natural blending

ComfyUI > Nodes > TrentNodes > Phoneme To Mouth Shapes

ComfyUI Node: Phoneme To Mouth Shapes

Class Name

PhonemeToMouthShapes

Category
Trent/LipSync

Author
TrentHunter82 (Account age: 0days) Extension
TrentNodes Latest Updated
2026-03-20 Github Stars
0.03K

Github Ask TrentHunter82 Current Questions Past Questions

Table of Content

Description
PhonemeToMouthShapes:
PhonemeToMouthShapes Input Parameters:
PhonemeToMouthShapes Output Parameters:
PhonemeToMouthShapes Usage Tips:
PhonemeToMouthShapes Common Errors and Solutions:
Related Nodes

How to Install TrentNodes

Install this extension via the ComfyUI Manager by searching for TrentNodes

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter TrentNodes in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Phoneme To Mouth Shapes Description

Transforms phoneme timing data into mouth shape indices for realistic lip-sync animations.

Phoneme To Mouth Shapes:

The PhonemeToMouthShapes node is designed to transform phoneme timing data into a sequence of mouth shape indices, which are essential for creating realistic lip-sync animations. This node is particularly useful for animators and AI artists who want to synchronize mouth movements with audio tracks in their projects. By converting phonemes into corresponding mouth shapes, the node facilitates the creation of animations that accurately reflect spoken words, enhancing the visual storytelling experience. The node leverages a mapping system to translate phonemes into standard mouth shapes, ensuring compatibility with animation mouth charts. This process is crucial for achieving natural-looking lip movements in animated characters, making it an invaluable tool for non-human character animation and other creative applications.

Phoneme To Mouth Shapes Input Parameters:

phoneme_data

This parameter represents the phoneme timing data, which is typically obtained from an audio-to-phoneme conversion process. It is a list of dictionaries containing information about the timing and type of each phoneme detected in the audio. This data is crucial for determining the sequence of mouth shapes that will be used in the animation.

duration

The duration parameter specifies the total length of the audio in seconds. It is a floating-point value with a default of 1.0, a minimum of 0.1, and a maximum of 3600.0. This parameter is important for calculating the timing of mouth shape transitions, ensuring they align with the audio's duration.

fps

The fps parameter stands for frames per second and determines the frame rate of the resulting animation. It is a floating-point value with a default of 24.0, a minimum of 1.0, and a maximum of 120.0. This parameter affects the smoothness and timing of the mouth shape transitions, with higher values resulting in smoother animations.

mapping_type

This parameter defines the phoneme-to-viseme mapping type used to convert phonemes into mouth shapes. It offers options such as "arpabet," "ipa," and "simplified," with "arpabet" as the default. The choice of mapping type can influence the accuracy and style of the mouth shapes generated, allowing for customization based on specific animation needs.

hold_frames

The hold_frames parameter specifies the minimum number of frames each mouth shape should be held for during the animation. It is an integer value with a default of 2, a minimum of 1, and a maximum of 10. This parameter helps control the pacing of mouth shape changes, preventing rapid flickering and ensuring smoother transitions.

smoothing

The smoothing parameter is a boolean that determines whether smoothing should be applied to the mouth shape sequence. It has a default value of True. Smoothing helps reduce flickering and abrupt changes between mouth shapes, resulting in more natural and visually appealing animations.

Phoneme To Mouth Shapes Output Parameters:

mouth_sequence

The mouth_sequence output is a list of integers representing the sequence of mouth shape indices for each frame of the animation. These indices correspond to specific mouth shapes defined in standard animation mouth charts, allowing for precise synchronization with the audio.

frame_count

The frame_count output is an integer representing the total number of frames in the generated mouth shape sequence. This value is important for understanding the length of the animation and ensuring it matches the duration of the audio.

Phoneme To Mouth Shapes Usage Tips:

To achieve the most natural-looking lip-sync animations, experiment with different mapping_type options to find the one that best suits your audio and animation style.
Adjust the hold_frames parameter to control the pacing of mouth shape changes. Increasing this value can help reduce flickering and create smoother transitions between shapes.
Enable smoothing to enhance the visual quality of your animations by minimizing abrupt changes between mouth shapes.

Phoneme To Mouth Shapes Common Errors and Solutions:

"Invalid phoneme data format"

Explanation: This error occurs when the phoneme_data input is not in the expected list of dictionaries format.
Solution: Ensure that the phoneme_data is correctly formatted as a list of dictionaries, each containing timing and phoneme type information.

"Duration must be between 0.1 and 3600.0 seconds"

Explanation: The duration parameter is set outside the allowed range.
Solution: Adjust the duration value to be within the specified range of 0.1 to 3600.0 seconds.

"FPS value out of range"

Explanation: The fps parameter is set below 1.0 or above 120.0.
Solution: Set the fps value within the valid range to ensure proper frame rate for the animation.

"Unsupported mapping type"

Explanation: The mapping_type provided is not one of the supported options.
Solution: Choose a valid mapping_type from the available options: "arpabet," "ipa," or "simplified."

Phoneme To Mouth Shapes Related Nodes

Go back to the extension to check out more related nodes.

TrentNodes

Table of Content

Description
PhonemeToMouthShapes:
PhonemeToMouthShapes Input Parameters:
PhonemeToMouthShapes Output Parameters:
PhonemeToMouthShapes Usage Tips:
PhonemeToMouthShapes Common Errors and Solutions:
Related Nodes

Wan 2.2 + Lightx2v V2 | Ultra Fast I2V & T2V

Dual Light LoRA setup, 4X faster.

ComfyUI Grounding | Object Tracking Workflow

Track any subject with pixel-perfect accuracy for stunning VFX results.

HunyuanCustom | Multi-Subject Video Generator

Create dual-subject videos with exceptional identity preservation.

Wan Alpha | Transparent Video Generator

Alpha magic: instant transparent background videos for VFX and design.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy