ComfyUI > Nodes > ComfyUI Custom Dia

ComfyUI Extension: ComfyUI Custom Dia

Repo Name

comfyUI-customDia

Author
nobrainX2 (Account age: 2326 days)
Nodes
View all nodes(2)
Latest Updated
2025-05-29
Github Stars
0.01K

How to Install ComfyUI Custom Dia

Install this extension via the ComfyUI Manager by searching for ComfyUI Custom Dia
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI Custom Dia in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI Custom Dia Description

ComfyUI Custom Dia integrates the a/Dia TTS model into ComfyUI, enhancing its text-to-speech capabilities. This extension leverages nari-labs' innovative work to provide advanced TTS features within the ComfyUI framework.

comfyUI-customDia Introduction

Welcome to comfyUI-customDia, an innovative extension that integrates the Dia Text-to-Speech (TTS) model into the ComfyUI environment. Developed by the author, this extension leverages the powerful capabilities of the Dia model, created by Nari Labs, to generate highly realistic dialogue from text. Whether you're an AI artist looking to add voice to your creations or someone interested in exploring the possibilities of TTS technology, comfyUI-customDia offers a user-friendly solution. It allows you to create dialogues with multiple speakers, incorporate nonverbal cues, and even clone voices, all within the ComfyUI framework.

How comfyUI-customDia Works

At its core, comfyUI-customDia uses the Dia model to transform written text into spoken dialogue. The extension functions as an output node within ComfyUI, meaning it can operate independently or as part of a larger workflow. You can input text with speaker tags like [S1] and [S2] to designate different speakers, and the model will generate corresponding audio. Additionally, you can include nonverbal tags such as (laughs) or (sighs) to enrich the audio with realistic expressions. The extension also supports voice cloning by allowing you to input an audio sample and its transcript, enabling the model to mimic the voice in the sample.

comfyUI-customDia Features

  • Multi-Channel Audio Support: The extension has been modified to handle multi-channel audio inputs, allowing for stereo files or tensors directly from ComfyUI nodes.
  • Speech Prompting: Define dialogues using text fields with speaker tags and nonverbal cues to create dynamic and expressive audio outputs.
  • Voice Cloning: Input an audio sample and its transcript to clone the voice, providing a personalized touch to your projects.
  • Audio Retime Node: An additional node is available to adjust the timing of the output audio, ensuring it fits seamlessly into your workflow.
  • Pitch Preservation: With the optional librosa package, you can maintain the pitch of the original audio, enhancing the naturalness of the cloned voice.

comfyUI-customDia Models

The extension utilizes the Dia model, a 1.6 billion parameter TTS model designed for generating realistic dialogue. The model supports English and can produce a wide range of vocal expressions, making it ideal for creating engaging audio content. By conditioning the output on audio, you can control the emotion and tone of the speech, allowing for nuanced and expressive results.

Troubleshooting comfyUI-customDia

Here are some common issues you might encounter while using comfyUI-customDia and how to resolve them:

  • Dependency Issues: Ensure that the descript-audio-codec and soundfile Python packages are installed. If the installation of descript-audio-codec downgrades protobuf to version 3.19.6, causing other nodes to crash, upgrade protobuf by running pip install protobuf --upgrade in the ComfyUI terminal.
  • Audio Quality: If the audio quality is not as expected, check that the input text follows the recommended guidelines, such as using [S1] and [S2] tags correctly and keeping the input text length moderate.
  • Voice Cloning: For best results with voice cloning, ensure the audio sample is between 5 to 10 seconds long and that the transcript is accurate and formatted correctly.

Learn More about comfyUI-customDia

To further explore the capabilities of comfyUI-customDia, consider visiting the following resources:

  • Dia TTS Model on GitHub: Learn more about the underlying model and its features.
  • Hugging Face Model Page: Access pretrained model checkpoints and additional documentation.
  • Community Support on Discord: Join the community to ask questions, share experiences, and get support from other users and developers. By leveraging these resources, you can enhance your understanding of comfyUI-customDia and unlock its full potential for your AI art projects.

ComfyUI Custom Dia Related Nodes

RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.