ComfyUI > Nodes > ComfyUI Custom Dia > Dia text to speech

ComfyUI Node: Dia text to speech

Class Name

Dia text to speech

Category
audio/dia
Author
nobrainX2 (Account age: 2326days)
Extension
ComfyUI Custom Dia
Latest Updated
2025-05-29
Github Stars
0.01K

How to Install ComfyUI Custom Dia

Install this extension via the ComfyUI Manager by searching for ComfyUI Custom Dia
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI Custom Dia in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Dia text to speech Description

Convert written text to spoken dialogue with customizable voices and expressions for dynamic audio content creation.

Dia text to speech:

The Dia text to speech node is a powerful tool designed to convert written text into spoken dialogue, leveraging an open weights model to provide users with full control over scripts and voices. This node is particularly beneficial for AI artists and developers who wish to integrate realistic and customizable speech synthesis into their projects. By utilizing advanced text-to-speech algorithms, the node allows for the generation of high-quality audio outputs that can include various vocal expressions and tags, such as laughter or sighs, to enhance the naturalness and expressiveness of the dialogue. The node's flexibility and ease of use make it an essential component for creating dynamic audio content, whether for artistic, educational, or entertainment purposes.

Dia text to speech Input Parameters:

model_path

This parameter specifies the file path to the pre-trained model used for text-to-speech conversion. The default path is set to models/Dia/dia-v0_1.pth. It is crucial for loading the correct model weights necessary for generating speech.

seed

The seed parameter is an integer that initializes the random number generator, ensuring reproducibility of the audio output. It has a default value of 12345, with a minimum of 0 and a maximum defined by MAX_SEED. Adjusting the seed can lead to variations in the generated speech.

save_audio_file

This boolean parameter determines whether the generated audio should be saved as a file. By default, it is set to True, meaning the audio will be saved automatically.

filename_prefix

This string parameter sets the prefix for the filename of the saved audio file. The default prefix is audio/dia, which helps in organizing and identifying the generated audio files.

speech

The speech parameter is a multiline string input where you can specify the text to be converted into speech. It includes a default script with multiple speakers and expressions, allowing for a demonstration of the node's capabilities. This parameter is essential for defining the content of the audio output.

cfg_scale

This float parameter controls the configuration scale, influencing the model's behavior during speech generation. It ranges from 0.0 to 10.0, with a default value of 3.0. Adjusting this scale can affect the creativity and variability of the generated speech.

temperature

The temperature parameter is a float that affects the randomness of the speech generation process. It has a range from 0.0 to 10.0, with a default value of 1.3. Higher values result in more diverse outputs, while lower values produce more deterministic results.

top_p

This float parameter, ranging from 0.0 to 10.0 with a default of 0.95, is used for nucleus sampling during speech generation. It determines the cumulative probability threshold for selecting the next word, balancing between diversity and coherence.

use_cfg_filter

A boolean parameter that, when set to True (default), applies a configuration filter to the speech generation process, potentially improving the quality of the output.

use_torch_compile

This boolean parameter, defaulting to False, indicates whether to use Torch's compile feature for optimizing the model's performance. Enabling it may speed up the generation process but could increase the initial computation time.

cfg_filter_top_k

An integer parameter that specifies the top-k filtering for the configuration filter, with a default value of 35 and a range from 0 to 100. It helps in refining the selection of words during speech generation.

input_audio

This optional parameter allows you to provide an audio file as input, which can be used as a reference or prompt for the speech generation process.

input_audio_transcript

A multiline string parameter for inputting the transcript of the provided audio. It is optional and can be used to align the generated speech with the input audio.

available_tags

This parameter lists the available vocal expression tags, such as (laughs) or (sighs), that can be included in the speech text to enhance expressiveness. It is a multiline string with a default set of tags.

Dia text to speech Output Parameters:

generated_audio

The primary output of the Dia text to speech node is the generated audio, which is an array representing the synthesized speech. This output is crucial for applications requiring realistic and expressive audio content, as it embodies the text input transformed into spoken dialogue.

Dia text to speech Usage Tips:

  • Experiment with different seed values to explore variations in the generated speech and find the most suitable output for your project.
  • Utilize the available_tags to add expressive elements to your speech, making it more engaging and natural.
  • Adjust the temperature and top_p parameters to balance between creativity and coherence in the speech output, depending on the desired level of randomness.
  • Consider enabling use_torch_compile for potentially faster performance, especially when generating longer audio sequences.

Dia text to speech Common Errors and Solutions:

Model file not found

  • Explanation: This error occurs when the specified model_path does not point to a valid file.
  • Solution: Ensure that the model_path is correct and that the model file exists at the specified location.

Invalid seed value

  • Explanation: The seed value provided is outside the acceptable range.
  • Solution: Check that the seed value is within the range of 0 to MAX_SEED and adjust it accordingly.

Audio file save failure

  • Explanation: The node fails to save the generated audio file.
  • Solution: Verify that the directory specified in filename_prefix exists and has write permissions.

Configuration scale out of range

  • Explanation: The cfg_scale parameter is set outside its valid range.
  • Solution: Adjust the cfg_scale to be within the range of 0.0 to 10.0.

Torch compile error

  • Explanation: An error occurs when using the use_torch_compile feature.
  • Solution: Ensure that your environment supports Torch's compile feature and consider disabling it if issues persist.

Dia text to speech Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI Custom Dia
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.