Visit ComfyUI Online for ready-to-use ComfyUI environment
Sophisticated text-to-speech node using advanced Gemini models for realistic voice synthesis in projects.
GeminiTTS is a sophisticated text-to-speech (TTS) node designed to convert written text into spoken audio using advanced Gemini models. This node is particularly beneficial for AI artists and developers who wish to integrate realistic and expressive voice synthesis into their projects. By leveraging the capabilities of Gemini's TTS models, users can generate high-quality audio outputs with customizable voice characteristics, such as tone and style, to suit various creative needs. The node supports both free and paid tiers, allowing flexibility in usage based on project requirements and budget constraints. With features like automatic fallback to alternative models and detailed voice acting instructions, GeminiTTS ensures reliable and versatile speech generation, making it an essential tool for enhancing multimedia content with natural-sounding voices.
The prompt parameter is a string input that contains the text you want to convert into speech. It serves as the primary content for the TTS process. The default value is "Say: Hello, this is a test of Gemini text-to-speech." This parameter can be multiline, allowing for more complex and lengthy text inputs. The prompt directly influences the generated audio, as it dictates the words and phrases that will be spoken.
The tts_model parameter specifies the TTS model to be used for generating speech. Available options include "gemini-2.5-pro-preview-tts" and "gemini-2.5-flash-preview-tts," with the default being "gemini-2.5-pro-preview-tts." This choice affects the quality and characteristics of the generated audio, as different models may have varying capabilities and performance levels.
The voice parameter allows you to select the voice used for speech synthesis. It offers a range of voices, with the default being "[M] Puck." This parameter is crucial for tailoring the audio output to match the desired vocal characteristics, such as gender, accent, and tone, enhancing the personalization of the generated speech.
The temperature parameter is a float that controls the creativity and variability of the speech output. It ranges from 0.0 to 2.0, with a default value of 1.0. A lower temperature results in more consistent and predictable speech, while a higher temperature introduces more variation and expressiveness, allowing for creative and dynamic audio outputs.
The api_key parameter is a string used to authenticate and authorize access to the Gemini TTS service. By default, it is an empty string. Providing a valid API key is essential for utilizing the TTS capabilities, especially when accessing paid features or higher usage limits.
The auto_fallback_to_flash parameter is a boolean that determines whether the node should automatically switch to the "flash" model if the primary model is unavailable. The default value is True. This feature ensures continuity in speech generation by using alternative models when necessary, minimizing disruptions in the workflow.
The retry_delay parameter is an integer that sets the delay time, in seconds, before retrying a failed TTS request. It ranges from 10 to 120 seconds, with a default of 30 seconds. This parameter helps manage rate limits and temporary service unavailability by spacing out retry attempts, improving the chances of successful speech generation.
The use_paid_tier parameter is a boolean that indicates whether to use the paid tier of the Gemini TTS service. The default value is False. Enabling this option allows access to premium features and higher quality models, which may incur additional costs but provide enhanced audio outputs.
The billing_project_id parameter is a string used to specify the billing project for paid tier usage. By default, it is an empty string. This parameter is necessary for tracking and managing billing information when using the paid features of the Gemini TTS service.
The aggressive_retry parameter is a boolean that controls whether to aggressively retry failed TTS requests. The default value is False. Enabling this option increases the frequency of retry attempts, which can be useful in scenarios where immediate speech generation is critical, but it may also lead to higher API usage.
The show_voice_info parameter is a boolean that determines whether to display additional information about the selected voice. The default value is False. When enabled, this feature provides insights into the voice's characteristics, such as tone and style, helping users make informed decisions about voice selection.
The audio output parameter is a dictionary containing the generated audio data. It includes the waveform, which is a tensor representation of the audio signal, and the sample rate, typically set at 24000 Hz. This output is crucial as it provides the actual speech audio that can be used in various applications, such as multimedia projects, voiceovers, and interactive systems.
The status output parameter is a string that provides feedback on the TTS process. It includes messages indicating the success or failure of the speech generation, along with details about the model and voice used. This output is important for understanding the outcome of the TTS request and diagnosing any issues that may have occurred during the process.
temperature parameter by setting it to higher values, but keep in mind that this may introduce more variability in the output.auto_fallback_to_flash feature to ensure continuous speech generation, especially in scenarios where the primary model may be temporarily unavailable.use_paid_tier option for projects that require higher quality audio outputs, as the paid models often provide superior performance and features.status output to monitor the success of your TTS requests and adjust parameters like retry_delay and aggressive_retry to optimize for reliability.<tts_model>tts_model is available and correctly specified. Check your API key and ensure it is valid. Consider enabling auto_fallback_to_flash to use alternative models.<retry_delay> secondsretry_delay before attempting another request. You may also consider increasing the retry_delay value to reduce the frequency of requests.prompt and tts_model. Ensure that the API key is valid and that the service is operational.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.