Maya1 TTS (AIO):
The Maya1TTS_Combined node is a powerful tool designed to streamline the process of text-to-speech (TTS) generation by integrating model loading and speech synthesis into a single node. This node is particularly beneficial for AI artists and developers who seek to create expressive and natural-sounding speech from text inputs. It offers a range of features, including model caching for efficient performance, the ability to design voices using natural language descriptions, and the inclusion of over 20 emotion tags that can be easily applied through clickable buttons. Additionally, the node supports real-time progress tracking and VRAM management, ensuring that resources are used efficiently. The Maya1TTS_Combined node is also equipped with native ComfyUI cancel support, allowing users to interrupt the process if needed. Overall, this node provides a comprehensive solution for generating high-quality speech with customizable emotional expression, making it an essential tool for creative projects involving voice synthesis.
Maya1 TTS (AIO) Input Parameters:
model_name
This parameter specifies the name of the TTS model to be used for speech generation. The choice of model can significantly impact the quality and characteristics of the generated speech. Users should select a model that best fits their needs in terms of voice quality and language support.
dtype
The dtype parameter determines the data type used for computations during the TTS process. It can affect the performance and precision of the model. Common options include float32 and float16, with float16 often providing faster performance at the cost of some precision.
attention_mechanism
This parameter defines the type of attention mechanism to be used by the model. Attention mechanisms are crucial for focusing on different parts of the input text during speech generation, and the choice can influence the naturalness and clarity of the output.
device
The device parameter specifies the hardware device on which the model will run, such as cpu or cuda for GPU acceleration. Using a GPU can significantly speed up the generation process, especially for large models.
voice_description
This parameter allows users to describe the desired voice characteristics using natural language. It provides a flexible way to customize the voice output, enabling users to specify attributes like pitch, tone, and style.
text
The text parameter is the input text that will be converted into speech. It is the primary content that the node processes to generate audio output.
keep_model_in_vram
This boolean parameter determines whether the model should be kept in VRAM after generation. Keeping the model in VRAM can speed up subsequent generations but may consume more memory resources.
temperature
The temperature parameter controls the randomness of the speech generation process. Lower values result in more deterministic outputs, while higher values introduce more variability and creativity in the speech.
top_p
This parameter, also known as nucleus sampling, sets the cumulative probability threshold for selecting the next token. It helps balance between diversity and coherence in the generated speech.
max_new_tokens
This parameter specifies the maximum number of new tokens to generate. It limits the length of the generated speech, ensuring that the output does not exceed a certain duration.
repetition_penalty
The repetition_penalty parameter discourages the model from repeating the same tokens, promoting more varied and interesting speech outputs.
seed
The seed parameter sets the random seed for the generation process, allowing for reproducibility of results. By using the same seed, users can generate the same speech output for the same input text.
chunk_longform
This boolean parameter indicates whether long input texts should be chunked into smaller segments for processing. Chunking can improve performance and manageability for lengthy texts.
emotion_tag_insert
This parameter allows users to insert specific emotion tags into the speech generation process, enhancing the expressiveness of the output. The default value is (none), meaning no emotion tag is applied.
chunk_index
The chunk_index parameter is used when processing long texts in chunks, indicating the current chunk being processed. It helps manage the sequence of chunked text processing.
total_chunks
This parameter specifies the total number of chunks when processing long texts, providing context for the chunking process and ensuring all parts of the text are covered.
Maya1 TTS (AIO) Output Parameters:
audio_output
The audio_output parameter contains the generated speech in the form of an audio waveform. It includes the waveform data and the sample rate, which is typically set to 24000 Hz. This output is the final result of the TTS process, providing users with a ready-to-use audio file that can be integrated into various applications.
Maya1 TTS (AIO) Usage Tips:
- To achieve the best performance, consider using a GPU by setting the
deviceparameter tocuda, especially for large models. - Experiment with the
temperatureandtop_pparameters to find the right balance between creativity and coherence in the generated speech. - Use the
voice_descriptionparameter to fine-tune the voice characteristics and match the desired style and tone for your project. - If you are working with long texts, enable the
chunk_longformparameter to manage the text in smaller, more manageable segments.
Maya1 TTS (AIO) Common Errors and Solutions:
Failed to load Maya1 model
- Explanation: This error occurs when the specified model cannot be loaded, possibly due to an incorrect model name or path.
- Solution: Verify that the
model_nameis correct and that the model files are accessible. Ensure that the model path is correctly specified and that all necessary files are present.
Generation failed: <error_message>
- Explanation: This error indicates that an unexpected issue occurred during the speech generation process.
- Solution: Check the error message for specific details and ensure that all input parameters are correctly set. If the problem persists, consider restarting the application or checking for updates to the node or model.
