RunComfy

Wan 2.2 | Open-Source Video Gen Leader

Available now! Better precision + smoother motion.

SeedVR2 | Image & Video Upscaler

Fixes blur instantly. Better than Keep/PMRF.

FLUX LoRA Training

Guide you through the entire process of training FLUX LoRA models using your custom datasets.

Wan Alpha | Transparent Video Generator

Alpha magic: instant transparent background videos for VFX and design.

ComfyUI > Nodes > ComfyUI_ChatterBox_SRT_Voice > 📺 F5-TTS SRT Voice Generation

ComfyUI Node: 📺 F5-TTS SRT Voice Generation

Class Name

ChatterBoxF5TTSSRTVoice

Category
F5-TTS Voice

Author
diodiogod (Account age: 768days) Extension
ComfyUI_ChatterBox_SRT_Voice Latest Updated
2026-03-21 Github Stars
0.08K

Github Ask diodiogod Current Questions Past Questions

Table of Content

Description
ChatterBoxF5TTSSRTVoice:
ChatterBoxF5TTSSRTVoice Input Parameters:
ChatterBoxF5TTSSRTVoice Output Parameters:
ChatterBoxF5TTSSRTVoice Usage Tips:
ChatterBoxF5TTSSRTVoice Common Errors and Solutions:
Related Nodes

How to Install ComfyUI_ChatterBox_SRT_Voice

Install this extension via the ComfyUI Manager by searching for ComfyUI_ChatterBox_SRT_Voice

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_ChatterBox_SRT_Voice in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

📺 F5-TTS SRT Voice Generation Description

ChatterBoxF5TTSSRTVoice generates synchronized voiceovers from text with subtitle alignment.

📺 F5-TTS SRT Voice Generation:

ChatterBoxF5TTSSRTVoice is a sophisticated node designed for generating voice outputs from text inputs, specifically tailored for creating synchronized voiceovers with subtitles (SRT). This node leverages advanced text-to-speech (TTS) technology to produce high-quality audio that aligns with the timing and content of subtitle files, making it an invaluable tool for multimedia projects that require precise audio-visual synchronization. The node supports various languages and offers customization options such as exaggeration and temperature settings to adjust the expressiveness and tone of the generated speech. By integrating features like chunking for long texts and crash protection templates, it ensures smooth and uninterrupted audio generation, even for complex or lengthy scripts. The node's ability to cache audio results further enhances its efficiency, allowing for faster processing times in subsequent operations. Overall, ChatterBoxF5TTSSRTVoice is essential for creators looking to enhance their projects with dynamic and contextually appropriate voiceovers.

📺 F5-TTS SRT Voice Generation Input Parameters:

t

This parameter represents the text input that you want to convert into speech. It is the primary content that the node will process to generate audio output. The text can be of any length, but longer texts may be automatically chunked into smaller segments for processing.

language

This parameter specifies the language of the text input. It ensures that the generated speech matches the linguistic characteristics of the input text, providing accurate pronunciation and intonation. The default language is English, but other languages are supported.

device

This parameter determines the computational device used for processing, such as a CPU or GPU. Selecting the appropriate device can impact the speed and efficiency of the TTS generation process.

exaggeration

This parameter controls the expressiveness of the generated speech. A higher exaggeration value results in more dramatic and expressive speech, while a lower value produces a more neutral tone. The range and default value are not specified in the context.

temperature

This parameter influences the variability and creativity of the speech output. A higher temperature value allows for more variation and spontaneity in the speech, while a lower value results in more predictable and consistent output. The range and default value are not specified in the context.

cfg_weight

This parameter adjusts the balance between the input text and any reference audio or prompts used in the generation process. It helps fine-tune the influence of external audio cues on the final speech output. The range and default value are not specified in the context.

seed

This parameter sets the random seed for the generation process, ensuring reproducibility of results. By using the same seed, you can generate consistent audio outputs for the same input text.

reference_audio

This optional parameter allows you to provide a reference audio file to guide the TTS generation. It can be used to match the style or tone of existing audio content. If not provided, the node will rely solely on the text input.

audio_prompt_path

This parameter specifies the file path to an audio prompt that can be used to influence the TTS output. It serves as an additional guide for the speech generation process.

enable_chunking

This boolean parameter determines whether long text inputs should be divided into smaller chunks for processing. Enabling chunking can improve performance and prevent issues with processing very long texts. The default value is True.

max_chars_per_chunk

This parameter sets the maximum number of characters allowed in each chunk when chunking is enabled. It helps manage the size of text segments for efficient processing. The default value is 400 characters.

chunk_combination_method

This parameter specifies the method used to combine audio chunks after processing. The "auto" option automatically selects the best method based on the input and settings.

silence_between_chunks_ms

This parameter defines the duration of silence, in milliseconds, inserted between audio chunks. It ensures smooth transitions between segments and can be adjusted to suit the pacing of the speech. The default value is 100 milliseconds.

crash_protection_template

This parameter provides a template for padding short text segments to prevent crashes during sequential generation. It is particularly useful for very short texts that may not meet the minimum length requirements for processing.

enable_audio_cache

This boolean parameter enables caching of generated audio results, allowing for faster processing of repeated or similar inputs. The default value is True.

📺 F5-TTS SRT Voice Generation Output Parameters:

Audio Output

The primary output of the ChatterBoxF5TTSSRTVoice node is the generated audio file, which contains the synthesized speech corresponding to the input text. This audio output is synchronized with the subtitle timing, making it suitable for use in multimedia projects that require precise audio-visual alignment. The output format and quality depend on the settings and parameters used during the generation process.

📺 F5-TTS SRT Voice Generation Usage Tips:

To achieve the best results, ensure that the input text is well-structured and free of errors, as this will directly impact the quality of the generated speech.
Experiment with the exaggeration and temperature parameters to find the right balance of expressiveness and consistency for your project.
Use the reference_audio and audio_prompt_path parameters to match the style and tone of existing audio content, creating a cohesive audio experience.
Enable chunking for long texts to improve processing efficiency and prevent potential issues with lengthy inputs.

📺 F5-TTS SRT Voice Generation Common Errors and Solutions:

"Text input too short for processing"

Explanation: The input text is too short to be processed effectively, which may lead to crashes or suboptimal audio output.
Solution: Use the crash_protection_template parameter to pad short text segments, ensuring they meet the minimum length requirements for processing.

"Unsupported language specified"

Explanation: The language parameter is set to a language that is not supported by the TTS model.
Solution: Verify that the specified language is supported and adjust the language parameter accordingly.

"Device not available for processing"

Explanation: The specified device for processing (e.g., GPU) is not available or not properly configured.
Solution: Check the device configuration and ensure that the necessary hardware and drivers are installed and accessible. Consider switching to a different device if the issue persists.

📺 F5-TTS SRT Voice Generation Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI_ChatterBox_SRT_Voice

Table of Content

Description
ChatterBoxF5TTSSRTVoice:
ChatterBoxF5TTSSRTVoice Input Parameters:
ChatterBoxF5TTSSRTVoice Output Parameters:
ChatterBoxF5TTSSRTVoice Usage Tips:
ChatterBoxF5TTSSRTVoice Common Errors and Solutions:
Related Nodes

Qwen Image Edit Plus 2511 LoRA Inference | AI Toolkit ComfyUI

Keep AI Toolkit-trained Qwen Image Edit Plus 2511 LoRA edits in ComfyUI preview-aligned using a single RCQwenImageEditPlus2511 custom node.

FLUX Kontext Preset | Scene Control

Master scene creation with curated one-click AI presets.

Wan2.2 S2V | Sound to Video Generator

Turns your audio clip into lifelike, synced video from one image

Consistent Face 3x3 Generator

Generate 3x3 consistent character faces using FLUX and Depth LoRA

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: 📺 F5-TTS SRT Voice Generation

ChatterBoxF5TTSSRTVoice

How to Install ComfyUI_ChatterBox_SRT_Voice

📺 F5-TTS SRT Voice Generation Description

📺 F5-TTS SRT Voice Generation:

📺 F5-TTS SRT Voice Generation Input Parameters:

t

language

device

exaggeration

temperature

cfg_weight

seed

reference_audio

audio_prompt_path

enable_chunking

max_chars_per_chunk

chunk_combination_method

silence_between_chunks_ms

crash_protection_template

enable_audio_cache

📺 F5-TTS SRT Voice Generation Output Parameters:

Audio Output

📺 F5-TTS SRT Voice Generation Usage Tips:

📺 F5-TTS SRT Voice Generation Common Errors and Solutions:

"Text input too short for processing"

"Unsupported language specified"

"Device not available for processing"

📺 F5-TTS SRT Voice Generation Related Nodes