ComfyUI > Nodes > ComfyUI_ChatterBox_SRT_Voice > 📺 F5-TTS SRT Voice Generation

ComfyUI Node: 📺 F5-TTS SRT Voice Generation

Class Name

ChatterBoxF5TTSSRTVoice

Category
F5-TTS Voice
Author
diodiogod (Account age: 768days)
Extension
ComfyUI_ChatterBox_SRT_Voice
Latest Updated
2026-03-21
Github Stars
0.08K

How to Install ComfyUI_ChatterBox_SRT_Voice

Install this extension via the ComfyUI Manager by searching for ComfyUI_ChatterBox_SRT_Voice
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_ChatterBox_SRT_Voice in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

📺 F5-TTS SRT Voice Generation Description

ChatterBoxF5TTSSRTVoice generates synchronized voiceovers from text with subtitle alignment.

📺 F5-TTS SRT Voice Generation:

ChatterBoxF5TTSSRTVoice is a sophisticated node designed for generating voice outputs from text inputs, specifically tailored for creating synchronized voiceovers with subtitles (SRT). This node leverages advanced text-to-speech (TTS) technology to produce high-quality audio that aligns with the timing and content of subtitle files, making it an invaluable tool for multimedia projects that require precise audio-visual synchronization. The node supports various languages and offers customization options such as exaggeration and temperature settings to adjust the expressiveness and tone of the generated speech. By integrating features like chunking for long texts and crash protection templates, it ensures smooth and uninterrupted audio generation, even for complex or lengthy scripts. The node's ability to cache audio results further enhances its efficiency, allowing for faster processing times in subsequent operations. Overall, ChatterBoxF5TTSSRTVoice is essential for creators looking to enhance their projects with dynamic and contextually appropriate voiceovers.

📺 F5-TTS SRT Voice Generation Input Parameters:

t

This parameter represents the text input that you want to convert into speech. It is the primary content that the node will process to generate audio output. The text can be of any length, but longer texts may be automatically chunked into smaller segments for processing.

language

This parameter specifies the language of the text input. It ensures that the generated speech matches the linguistic characteristics of the input text, providing accurate pronunciation and intonation. The default language is English, but other languages are supported.

device

This parameter determines the computational device used for processing, such as a CPU or GPU. Selecting the appropriate device can impact the speed and efficiency of the TTS generation process.

exaggeration

This parameter controls the expressiveness of the generated speech. A higher exaggeration value results in more dramatic and expressive speech, while a lower value produces a more neutral tone. The range and default value are not specified in the context.

temperature

This parameter influences the variability and creativity of the speech output. A higher temperature value allows for more variation and spontaneity in the speech, while a lower value results in more predictable and consistent output. The range and default value are not specified in the context.

cfg_weight

This parameter adjusts the balance between the input text and any reference audio or prompts used in the generation process. It helps fine-tune the influence of external audio cues on the final speech output. The range and default value are not specified in the context.

seed

This parameter sets the random seed for the generation process, ensuring reproducibility of results. By using the same seed, you can generate consistent audio outputs for the same input text.

reference_audio

This optional parameter allows you to provide a reference audio file to guide the TTS generation. It can be used to match the style or tone of existing audio content. If not provided, the node will rely solely on the text input.

audio_prompt_path

This parameter specifies the file path to an audio prompt that can be used to influence the TTS output. It serves as an additional guide for the speech generation process.

enable_chunking

This boolean parameter determines whether long text inputs should be divided into smaller chunks for processing. Enabling chunking can improve performance and prevent issues with processing very long texts. The default value is True.

max_chars_per_chunk

This parameter sets the maximum number of characters allowed in each chunk when chunking is enabled. It helps manage the size of text segments for efficient processing. The default value is 400 characters.

chunk_combination_method

This parameter specifies the method used to combine audio chunks after processing. The "auto" option automatically selects the best method based on the input and settings.

silence_between_chunks_ms

This parameter defines the duration of silence, in milliseconds, inserted between audio chunks. It ensures smooth transitions between segments and can be adjusted to suit the pacing of the speech. The default value is 100 milliseconds.

crash_protection_template

This parameter provides a template for padding short text segments to prevent crashes during sequential generation. It is particularly useful for very short texts that may not meet the minimum length requirements for processing.

enable_audio_cache

This boolean parameter enables caching of generated audio results, allowing for faster processing of repeated or similar inputs. The default value is True.

📺 F5-TTS SRT Voice Generation Output Parameters:

Audio Output

The primary output of the ChatterBoxF5TTSSRTVoice node is the generated audio file, which contains the synthesized speech corresponding to the input text. This audio output is synchronized with the subtitle timing, making it suitable for use in multimedia projects that require precise audio-visual alignment. The output format and quality depend on the settings and parameters used during the generation process.

📺 F5-TTS SRT Voice Generation Usage Tips:

  • To achieve the best results, ensure that the input text is well-structured and free of errors, as this will directly impact the quality of the generated speech.
  • Experiment with the exaggeration and temperature parameters to find the right balance of expressiveness and consistency for your project.
  • Use the reference_audio and audio_prompt_path parameters to match the style and tone of existing audio content, creating a cohesive audio experience.
  • Enable chunking for long texts to improve processing efficiency and prevent potential issues with lengthy inputs.

📺 F5-TTS SRT Voice Generation Common Errors and Solutions:

"Text input too short for processing"

  • Explanation: The input text is too short to be processed effectively, which may lead to crashes or suboptimal audio output.
  • Solution: Use the crash_protection_template parameter to pad short text segments, ensuring they meet the minimum length requirements for processing.

"Unsupported language specified"

  • Explanation: The language parameter is set to a language that is not supported by the TTS model.
  • Solution: Verify that the specified language is supported and adjust the language parameter accordingly.

"Device not available for processing"

  • Explanation: The specified device for processing (e.g., GPU) is not available or not properly configured.
  • Solution: Check the device configuration and ensure that the necessary hardware and drivers are installed and accessible. Consider switching to a different device if the issue persists.

📺 F5-TTS SRT Voice Generation Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_ChatterBox_SRT_Voice
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

📺 F5-TTS SRT Voice Generation