RunComfy

Flux Klein Face Swap | Realistic AI Face Editor

Swap faces perfectly. Natural, lifelike, and fast AI-powered editing.

Wan 2.1 | Revolutionary Video Generation

Create incredible videos from text or images with breakthrough AI running on everyday CPUs.

Virtual Try-On | Realistic Fashion Fitting

Instant outfit previews with natural, well-fitted clothing visuals

Wan Alpha | Transparent Video Generator

Alpha magic: instant transparent background videos for VFX and design.

ComfyUI > Nodes > ComfyUI_ChatterBox_SRT_Voice > 🎤 F5-TTS Voice Generation

ComfyUI Node: 🎤 F5-TTS Voice Generation

Class Name

ChatterBoxF5TTSVoice

Category
F5-TTS Voice

Author
diodiogod (Account age: 768days) Extension
ComfyUI_ChatterBox_SRT_Voice Latest Updated
2026-03-21 Github Stars
0.08K

Github Ask diodiogod Current Questions Past Questions

Table of Content

Description
ChatterBoxF5TTSVoice:
ChatterBoxF5TTSVoice Input Parameters:
ChatterBoxF5TTSVoice Output Parameters:
ChatterBoxF5TTSVoice Usage Tips:
ChatterBoxF5TTSVoice Common Errors and Solutions:
Related Nodes

How to Install ComfyUI_ChatterBox_SRT_Voice

Install this extension via the ComfyUI Manager by searching for ComfyUI_ChatterBox_SRT_Voice

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_ChatterBox_SRT_Voice in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

🎤 F5-TTS Voice Generation Description

ChatterBoxF5TTSVoice converts text to natural-sounding speech with customizable voices.

🎤 F5-TTS Voice Generation:

ChatterBoxF5TTSVoice is a sophisticated node designed to convert text into speech using the F5-TTS engine, which is part of the ComfyUI ChatterBox suite. This node is particularly beneficial for AI artists and developers who need to generate high-quality, natural-sounding audio from text inputs. It supports various features such as language selection, voice customization, and chunking of text to handle longer inputs efficiently. The node is capable of managing interruptions and applying pause tags to enhance the natural flow of speech. By leveraging advanced text-to-speech technology, ChatterBoxF5TTSVoice provides a seamless way to create audio content, making it an essential tool for projects that require dynamic and expressive voice synthesis.

🎤 F5-TTS Voice Generation Input Parameters:

text

The text parameter is the primary input for the node, representing the text that will be converted into speech. It is crucial for defining the content of the audio output. There are no explicit minimum or maximum values provided, but the text should be concise enough to be processed efficiently, especially if chunking is not enabled.

language

The language parameter specifies the language in which the text will be spoken. This is important for ensuring that the pronunciation and intonation are appropriate for the given language. The node supports multiple languages, allowing for versatile applications.

device

The device parameter determines the hardware on which the text-to-speech processing will occur. This can impact the speed and efficiency of the audio generation, with options typically including CPU or GPU.

exaggeration

The exaggeration parameter adjusts the expressiveness of the generated speech. Higher values may result in more dramatic intonation, which can be useful for certain artistic or narrative purposes.

temperature

The temperature parameter influences the variability and creativity of the speech synthesis. A higher temperature can lead to more varied and less predictable speech patterns, while a lower temperature results in more consistent output.

cfg_weight

The cfg_weight parameter controls the balance between the input text and any reference audio or prompts. This can affect how closely the generated speech matches the desired style or tone.

seed

The seed parameter is used to initialize the random number generator for the text-to-speech process. This ensures reproducibility of results, allowing the same input to produce the same output across different runs.

reference_audio

The reference_audio parameter allows you to provide an audio sample that the node can use as a style guide for the generated speech. This can help in achieving a specific voice or tone.

audio_prompt_path

The audio_prompt_path parameter specifies the file path to an audio prompt that can guide the speech synthesis process. This is useful for maintaining consistency with existing audio content.

enable_chunking

The enable_chunking parameter is a boolean that determines whether long text inputs should be split into smaller chunks for processing. This helps in managing memory and processing resources effectively.

max_chars_per_chunk

The max_chars_per_chunk parameter sets the maximum number of characters allowed in each chunk when chunking is enabled. This ensures that each segment is manageable and can be processed without issues.

chunk_combination_method

The chunk_combination_method parameter defines how the audio chunks are combined after processing. Options may include automatic methods or specific user-defined strategies.

silence_between_chunks_ms

The silence_between_chunks_ms parameter specifies the duration of silence to be inserted between audio chunks. This can help in creating natural pauses in the speech output.

crash_protection_template

The crash_protection_template parameter provides a template for padding short text segments to prevent crashes during sequential generation. This is particularly useful for ensuring stability in the synthesis process.

enable_audio_cache

The enable_audio_cache parameter is a boolean that determines whether the generated audio should be cached for future use. This can improve efficiency by avoiding redundant processing.

🎤 F5-TTS Voice Generation Output Parameters:

wav

The wav output parameter represents the generated audio waveform. This is the primary output of the node, providing the synthesized speech in a format that can be played back or further processed. The length of the audio is determined by the input text and the processing parameters.

info

The info output parameter provides metadata about the generated audio, including details such as the duration of the audio and the model used for synthesis. This information can be useful for logging and debugging purposes.

🎤 F5-TTS Voice Generation Usage Tips:

To achieve the best results, ensure that your input text is well-structured and free of unnecessary tags or formatting that might confuse the synthesis process.
Experiment with the temperature and exaggeration parameters to find the right balance for your project's needs, especially if you require a specific tone or expressiveness.
Utilize the enable_chunking feature for longer texts to prevent memory issues and ensure smooth processing.

🎤 F5-TTS Voice Generation Common Errors and Solutions:

"Text input too long"

Explanation: This error occurs when the input text exceeds the processing capacity of the node without chunking enabled.
Solution: Enable the enable_chunking parameter and set an appropriate max_chars_per_chunk value to split the text into manageable segments.

"Invalid language code"

Explanation: The specified language code is not supported by the node.
Solution: Verify that the language code is correct and supported by the F5-TTS engine. Refer to the documentation for a list of valid language codes.

"Audio prompt path not found"

Explanation: The file path provided for the audio prompt does not exist or is incorrect.
Solution: Double-check the audio_prompt_path to ensure it points to a valid audio file on your system.

"Seed value not set"

Explanation: The seed parameter is missing, leading to non-reproducible results.
Solution: Provide a valid seed value to ensure consistent output across different runs.

🎤 F5-TTS Voice Generation Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI_ChatterBox_SRT_Voice

Table of Content

Description
ChatterBoxF5TTSVoice:
ChatterBoxF5TTSVoice Input Parameters:
ChatterBoxF5TTSVoice Output Parameters:
ChatterBoxF5TTSVoice Usage Tips:
ChatterBoxF5TTSVoice Common Errors and Solutions:
Related Nodes

SCAIL Model | Pose-Guided Animation Maker

Pose-driven animation with identity stability and motion precision.

HiDream E1.1 | AI Image Editing

Edit images with natural language using HiDream E1.1 model

SAM 3D ComfyUI | Object & Body Animation

Create realistic 3D motion and animation from static images instantly.

IPAdapter Plus (V2) | One-Image Style Transfer

Use IPAdapter Plus and ControlNet for precise style transfer with a single reference image.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: 🎤 F5-TTS Voice Generation

ChatterBoxF5TTSVoice

How to Install ComfyUI_ChatterBox_SRT_Voice

🎤 F5-TTS Voice Generation Description

🎤 F5-TTS Voice Generation:

🎤 F5-TTS Voice Generation Input Parameters:

text

language

device

exaggeration

temperature

cfg_weight

seed

reference_audio

audio_prompt_path

enable_chunking

max_chars_per_chunk

chunk_combination_method

silence_between_chunks_ms

crash_protection_template

enable_audio_cache

🎤 F5-TTS Voice Generation Output Parameters:

wav

info

🎤 F5-TTS Voice Generation Usage Tips:

🎤 F5-TTS Voice Generation Common Errors and Solutions:

"Text input too long"

"Invalid language code"

"Audio prompt path not found"

"Seed value not set"

🎤 F5-TTS Voice Generation Related Nodes