ComfyUI > Nodes > ComfyUI-Gemini_TTS > 🎙️ Gemini Text-to-Speech

ComfyUI Node: 🎙️ Gemini Text-to-Speech

Class Name

GeminiTTS

Category
Gemini TTS
Author
ShmuelRonen (Account age: 1744days)
Extension
ComfyUI-Gemini_TTS
Latest Updated
2025-05-23
Github Stars
0.02K

How to Install ComfyUI-Gemini_TTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-Gemini_TTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-Gemini_TTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

🎙️ Gemini Text-to-Speech Description

Sophisticated text-to-speech node using advanced Gemini models for realistic voice synthesis in projects.

🎙️ Gemini Text-to-Speech:

GeminiTTS is a sophisticated text-to-speech (TTS) node designed to convert written text into spoken audio using advanced Gemini models. This node is particularly beneficial for AI artists and developers who wish to integrate realistic and expressive voice synthesis into their projects. By leveraging the capabilities of Gemini's TTS models, users can generate high-quality audio outputs with customizable voice characteristics, such as tone and style, to suit various creative needs. The node supports both free and paid tiers, allowing flexibility in usage based on project requirements and budget constraints. With features like automatic fallback to alternative models and detailed voice acting instructions, GeminiTTS ensures reliable and versatile speech generation, making it an essential tool for enhancing multimedia content with natural-sounding voices.

🎙️ Gemini Text-to-Speech Input Parameters:

prompt

The prompt parameter is a string input that contains the text you want to convert into speech. It serves as the primary content for the TTS process. The default value is "Say: Hello, this is a test of Gemini text-to-speech." This parameter can be multiline, allowing for more complex and lengthy text inputs. The prompt directly influences the generated audio, as it dictates the words and phrases that will be spoken.

tts_model

The tts_model parameter specifies the TTS model to be used for generating speech. Available options include "gemini-2.5-pro-preview-tts" and "gemini-2.5-flash-preview-tts," with the default being "gemini-2.5-pro-preview-tts." This choice affects the quality and characteristics of the generated audio, as different models may have varying capabilities and performance levels.

voice

The voice parameter allows you to select the voice used for speech synthesis. It offers a range of voices, with the default being "[M] Puck." This parameter is crucial for tailoring the audio output to match the desired vocal characteristics, such as gender, accent, and tone, enhancing the personalization of the generated speech.

temperature

The temperature parameter is a float that controls the creativity and variability of the speech output. It ranges from 0.0 to 2.0, with a default value of 1.0. A lower temperature results in more consistent and predictable speech, while a higher temperature introduces more variation and expressiveness, allowing for creative and dynamic audio outputs.

api_key

The api_key parameter is a string used to authenticate and authorize access to the Gemini TTS service. By default, it is an empty string. Providing a valid API key is essential for utilizing the TTS capabilities, especially when accessing paid features or higher usage limits.

auto_fallback_to_flash

The auto_fallback_to_flash parameter is a boolean that determines whether the node should automatically switch to the "flash" model if the primary model is unavailable. The default value is True. This feature ensures continuity in speech generation by using alternative models when necessary, minimizing disruptions in the workflow.

retry_delay

The retry_delay parameter is an integer that sets the delay time, in seconds, before retrying a failed TTS request. It ranges from 10 to 120 seconds, with a default of 30 seconds. This parameter helps manage rate limits and temporary service unavailability by spacing out retry attempts, improving the chances of successful speech generation.

use_paid_tier

The use_paid_tier parameter is a boolean that indicates whether to use the paid tier of the Gemini TTS service. The default value is False. Enabling this option allows access to premium features and higher quality models, which may incur additional costs but provide enhanced audio outputs.

billing_project_id

The billing_project_id parameter is a string used to specify the billing project for paid tier usage. By default, it is an empty string. This parameter is necessary for tracking and managing billing information when using the paid features of the Gemini TTS service.

aggressive_retry

The aggressive_retry parameter is a boolean that controls whether to aggressively retry failed TTS requests. The default value is False. Enabling this option increases the frequency of retry attempts, which can be useful in scenarios where immediate speech generation is critical, but it may also lead to higher API usage.

show_voice_info

The show_voice_info parameter is a boolean that determines whether to display additional information about the selected voice. The default value is False. When enabled, this feature provides insights into the voice's characteristics, such as tone and style, helping users make informed decisions about voice selection.

🎙️ Gemini Text-to-Speech Output Parameters:

audio

The audio output parameter is a dictionary containing the generated audio data. It includes the waveform, which is a tensor representation of the audio signal, and the sample rate, typically set at 24000 Hz. This output is crucial as it provides the actual speech audio that can be used in various applications, such as multimedia projects, voiceovers, and interactive systems.

status

The status output parameter is a string that provides feedback on the TTS process. It includes messages indicating the success or failure of the speech generation, along with details about the model and voice used. This output is important for understanding the outcome of the TTS request and diagnosing any issues that may have occurred during the process.

🎙️ Gemini Text-to-Speech Usage Tips:

  • To achieve more expressive and dynamic speech, experiment with the temperature parameter by setting it to higher values, but keep in mind that this may introduce more variability in the output.
  • Utilize the auto_fallback_to_flash feature to ensure continuous speech generation, especially in scenarios where the primary model may be temporarily unavailable.
  • Consider using the use_paid_tier option for projects that require higher quality audio outputs, as the paid models often provide superior performance and features.
  • Regularly check the status output to monitor the success of your TTS requests and adjust parameters like retry_delay and aggressive_retry to optimize for reliability.

🎙️ Gemini Text-to-Speech Common Errors and Solutions:

❌ TTS failed: <tts_model>

  • Explanation: This error indicates that the TTS request was unsuccessful, possibly due to model unavailability or incorrect parameters.
  • Solution: Verify that the selected tts_model is available and correctly specified. Check your API key and ensure it is valid. Consider enabling auto_fallback_to_flash to use alternative models.

⏰ Rate limited - try again in <retry_delay> seconds

  • Explanation: This message suggests that the TTS service has rate-limited your requests due to high usage.
  • Solution: Wait for the specified retry_delay before attempting another request. You may also consider increasing the retry_delay value to reduce the frequency of requests.

No audio data found in REST response

  • Explanation: This error occurs when the TTS service does not return any audio data, possibly due to a service issue or incorrect request parameters.
  • Solution: Double-check the input parameters, especially the prompt and tts_model. Ensure that the API key is valid and that the service is operational.

🎙️ Gemini Text-to-Speech Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-Gemini_TTS
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.