Wan 2.2 Animate | Character Swap & Lip-Sync

Transforms any face to speak and move like the original with ease.

Swap faces in images with natural language instructions while preserving style and context.

FLUX.1 Dev LoRA Inference | AI Toolkit ComfyUI

Run your AI Toolkit-trained FLUX.1 Dev LoRA in ComfyUI with training-matched behavior using a single RCFluxDev custom node.

Qwen Image 2512 | Precision AI Image Generator

Ultra-detailed art creation with next-level visual accuracy and control.

ComfyUI > Nodes > ComfyUI-IndexTTS2 > IndexTTS2 Advanced

ComfyUI Node: IndexTTS2 Advanced

Class Name

IndexTTS2Advanced

Category
Audio/IndexTTS

Author
snicolast (Account age: 2913days) Extension
ComfyUI-IndexTTS2 Latest Updated
2025-10-13 Github Stars
0.14K

Github Ask snicolast Current Questions Past Questions

Table of Content

Description
IndexTTS2Advanced:
IndexTTS2Advanced Input Parameters:
IndexTTS2Advanced Output Parameters:
IndexTTS2Advanced Usage Tips:
IndexTTS2Advanced Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-IndexTTS2

Install this extension via the ComfyUI Manager by searching for ComfyUI-IndexTTS2

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-IndexTTS2 in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

IndexTTS2 Advanced Description

Sophisticated text-to-speech node with advanced customization for nuanced audio synthesis.

IndexTTS2 Advanced:

The IndexTTS2Advanced node is a sophisticated tool designed for advanced text-to-speech synthesis, offering a range of customizable features to enhance audio output. This node is part of the ComfyUI-IndexTTS2 suite and is tailored for users who require more control over the synthesis process, such as AI artists looking to create nuanced and expressive audio content. It leverages advanced parameters to manipulate voice characteristics, including emotion and style, allowing for a more personalized and dynamic audio experience. The node's primary function is to convert text into speech while providing options to adjust emotional tone and style, making it a powerful asset for creating engaging audio narratives or artistic projects. By utilizing this node, you can achieve high-quality audio outputs that are both expressive and tailored to specific creative needs.

IndexTTS2 Advanced Input Parameters:

spk_audio_prompt

The spk_audio_prompt parameter is used to specify the path to an audio file that serves as a prompt for the speaker's voice characteristics. This input helps the node to mimic the voice style and tone of the provided audio, allowing for a more personalized speech synthesis. There are no specific minimum or maximum values, but the file should be a valid audio format.

text

The text parameter is the core input for the node, representing the text that you want to convert into speech. This parameter directly influences the content of the audio output. There are no specific constraints on the text length, but longer texts may be split into segments for processing.

emo_audio_prompt

The emo_audio_prompt parameter allows you to provide an audio file that contains the desired emotional tone for the speech synthesis. This input helps in adjusting the emotional expression of the generated speech, making it more aligned with the intended mood or feeling. Like spk_audio_prompt, it should be a valid audio file.

emo_alpha

The emo_alpha parameter controls the intensity of the emotional expression in the synthesized speech. It is a float value ranging from 0.0 to 1.0, where 0.0 means no emotional influence and 1.0 means full emotional influence from the emo_audio_prompt. The default value is typically set to 0.5 for balanced emotional expression.

emo_vector

The emo_vector parameter is used to provide a specific emotional vector that influences the emotional tone of the speech. This parameter allows for precise control over the emotional characteristics of the output, though specific values or formats are not detailed in the context.

use_random_style

The use_random_style parameter is a boolean that determines whether to apply a random style to the speech synthesis. When set to True, it introduces variability in the speech style, which can be useful for generating diverse audio outputs. The default value is False.

interval_silence

The interval_silence parameter specifies the duration of silence between segments of text when the input text is split. It is measured in milliseconds, and the default value is typically set to 200 ms. Adjusting this value can affect the pacing and naturalness of the speech.

max_text_tokens_per_segment

The max_text_tokens_per_segment parameter defines the maximum number of text tokens allowed per segment when the input text is split. This helps manage the processing of longer texts by breaking them into manageable parts. Specific default values are not provided, but it should be set according to the desired segment length.

generation_kwargs

The generation_kwargs parameter allows for additional keyword arguments to be passed to the synthesis process, providing further customization options. The specific options and their effects are not detailed in the context, but they offer advanced users the ability to fine-tune the synthesis process.

IndexTTS2 Advanced Output Parameters:

AUDIO

The AUDIO output parameter represents the synthesized speech audio generated by the node. This output is the primary result of the text-to-speech conversion process, providing a high-quality audio file that reflects the input text and any specified emotional or stylistic adjustments. The audio is typically in a format suitable for playback or further processing.

STRING

The STRING output parameter provides additional information or metadata about the synthesis process. This could include details such as processing time, applied settings, or any warnings encountered during synthesis. It serves as a useful reference for understanding the context of the generated audio.

IndexTTS2 Advanced Usage Tips:

To achieve a more natural and expressive audio output, experiment with different emo_audio_prompt files and adjust the emo_alpha parameter to find the right balance of emotional expression.
Use the spk_audio_prompt parameter to mimic specific voice characteristics, which can be particularly useful for creating consistent voiceovers or character voices in artistic projects.
Adjust the interval_silence parameter to control the pacing of the speech, especially when dealing with longer texts that are split into segments.

IndexTTS2 Advanced Common Errors and Solutions:

IndexTTS2 returned an unexpected result format

Explanation: This error occurs when the result from the synthesis process does not match the expected format, which should be a tuple or list with two elements.
Solution: Ensure that all input parameters are correctly specified and that the audio files provided are in valid formats. If the issue persists, check for updates or patches for the node that might address compatibility issues.

PyTorch is required for IndexTTS2 Advanced

Explanation: This error indicates that the PyTorch library is not installed or not accessible, which is necessary for the node's operation.
Solution: Install PyTorch by following the official installation instructions, ensuring that it is compatible with your system and Python version.

IndexTTS2 Advanced Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-IndexTTS2

Table of Content

Description
IndexTTS2Advanced:
IndexTTS2Advanced Input Parameters:
IndexTTS2Advanced Output Parameters:
IndexTTS2Advanced Usage Tips:
IndexTTS2Advanced Common Errors and Solutions:
Related Nodes

Flux Kontext Pulid | Consistent Character Generation

Create consistent characters using FLUX Kontext with a single face reference image.

Wan2.1 Stand In | Consistent Character Video Maker

Keeps characters consistent across video from just one reference image.

Qwen-Image | HD Multi-Text Poster Generator

New Era of Text Generation in Images!

FLUX Kontext Face Swap | Seamless Face Replacement

Photoreal face replacement with prompt-guided control and natural blending

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.