Fish Audio TTS

Converts text to speech using Fish Audio TTS, with customizable voice and audio settings.

Fish Audio TTS:

The MontagenFishAudioTTSNode is a powerful tool designed to convert text into speech using the Fish Audio Text-to-Speech (TTS) service. This node is part of the Montagen suite, which focuses on generating audio content from textual input. It allows you to select a specific voice for the TTS conversion, apply audio offsets, and manage various audio processing parameters to achieve the desired speech output. The node is particularly beneficial for AI artists and creators who wish to integrate realistic and customizable voiceovers into their projects. By leveraging the Fish Audio TTS service, this node provides high-quality audio outputs that can be synchronized with other multimedia elements, enhancing the overall production value of your creative works.

Fish Audio TTS Input Parameters:

text

This parameter accepts a list of strings, each representing a piece of text to be converted into speech. The text can be multiline, allowing for complex and lengthy narratives to be processed. It is crucial to ensure that the text is not empty, as this will result in an error. The text content directly influences the speech output, making it the core input for the TTS process.

trim

This float parameter specifies the amount of time, in seconds, to trim from the start of the audio output. It ranges from 0.0 to 2.0 seconds, with a default value of 0.0. Trimming can be useful for removing unwanted silence or noise at the beginning of the audio file, ensuring a cleaner and more professional result.

voice

The voice parameter is a required string input that determines the voice used for the TTS conversion. It is essential to select an appropriate voice that matches the tone and style of your project. The voice selection impacts the overall quality and authenticity of the speech output.

offset

This float parameter defines the offset in seconds to apply to the audio, with a default value of 0.0. The offset can be adjusted in increments of 0.1 seconds. It is useful for synchronizing the audio with other media elements, ensuring that the speech aligns perfectly with visual or other auditory components.

unique_id

A unique identifier for the workflow or project, this parameter helps in managing and organizing different TTS tasks. It ensures that each task is distinct and can be tracked or referenced independently within the Montagen system.

prompt

The prompt parameter is a string input that provides additional context or instructions for the TTS process. It can be used to guide the voice synthesis, influencing factors such as tone, emphasis, and pacing.

extra_pnginfo

This parameter allows for the inclusion of additional metadata or information in the form of a string. It can be used to embed context or instructions that may affect the TTS output or its integration with other media elements.

apiKey

An optional string parameter, the apiKey is required to authenticate and access the Fish Audio TTS service. If not provided, the system will attempt to retrieve it automatically. The apiKey ensures secure and authorized use of the TTS capabilities.

timeRangeList

This optional parameter is a list of dictionaries that define specific time ranges for the TTS process. It allows for precise control over when certain text segments are converted to speech, facilitating complex audio timelines and synchronization.

action

An optional string parameter that specifies the action to be taken during the TTS process. It can influence how the text is processed and converted into speech, providing flexibility in handling different audio production scenarios.

normalize

A boolean parameter with a default value of True, normalize determines whether the audio output should be normalized. Normalization adjusts the audio levels to ensure consistent volume and quality across different segments.

top_p

This float parameter, ranging from 0.0 to 1.0 with a default value of 0.7, controls the randomness of the TTS output. A higher value results in more diverse and creative speech variations, while a lower value produces more predictable and stable results.

temperature

Similar to top_p, this float parameter ranges from 0.0 to 1.0 and has a default value of 0.7. It influences the creativity and variability of the TTS output, with higher values leading to more dynamic and expressive speech.

Fish Audio TTS Output Parameters:

promptList

This output is a list of strings representing the processed prompts used in the TTS conversion. It provides a record of the input prompts, allowing you to verify and review the text that was converted into speech.

timeRangeList

The timeRangeList output is a list of time ranges that were applied during the TTS process. It reflects the timing and synchronization of the audio segments, ensuring that the speech aligns with other media elements as intended.

action

This output indicates the action that was executed during the TTS process. It provides insight into the processing decisions made by the node, helping you understand how the text was converted into speech.

resourceList

The resourceList is a list of file paths to the generated audio files. These files contain the speech output and can be used in various multimedia projects. The list allows you to easily access and manage the audio resources produced by the node.

Fish Audio TTS Usage Tips:

Ensure that the voice parameter is set to a suitable option that matches the style and tone of your project for optimal results.
Use the trim and offset parameters to fine-tune the audio output, removing unwanted silence and synchronizing the speech with other media elements.
Experiment with the top_p and temperature parameters to achieve the desired level of creativity and variability in the speech output.

Fish Audio TTS Common Errors and Solutions:

Voice is required for Fish Audio TTS.

Explanation: This error occurs when the voice parameter is not provided, which is essential for the TTS process.
Solution: Ensure that you specify a valid voice option in the voice parameter before executing the node.

API Key is required for Fish Audio TTS.

Explanation: This error indicates that the apiKey parameter is missing, preventing access to the Fish Audio TTS service.
Solution: Provide a valid API key in the apiKey parameter or ensure that the system can retrieve it automatically.

Input text cannot be empty

Explanation: This error arises when the text parameter is empty, as the TTS process requires text input to generate speech.
Solution: Make sure to input non-empty text in the text parameter to proceed with the TTS conversion.

ComfyUI Node: Fish Audio TTS

MontagenFishAudioTTSNode

How to Install ComfyUI-Montagen

Fish Audio TTS Description

Fish Audio TTS:

Fish Audio TTS Input Parameters:

text

trim

voice

offset

unique_id

prompt

extra_pnginfo

apiKey

timeRangeList

action

normalize

top_p

temperature

Fish Audio TTS Output Parameters:

promptList

timeRangeList

action

resourceList

Fish Audio TTS Usage Tips:

Fish Audio TTS Common Errors and Solutions:

Voice is required for Fish Audio TTS.

API Key is required for Fish Audio TTS.

Input text cannot be empty

Fish Audio TTS Related Nodes