ComfyUI > Nodes > ComfyUI_ChatterBox_SRT_Voice > 🎙️ ChatterBox Voice Capture (diogod)

ComfyUI Node: 🎙️ ChatterBox Voice Capture (diogod)

Class Name

ChatterBoxVoiceCaptureDiogod

Category
ChatterBox Voice
Author
diodiogod (Account age: 768days)
Extension
ComfyUI_ChatterBox_SRT_Voice
Latest Updated
2026-03-21
Github Stars
0.08K

How to Install ComfyUI_ChatterBox_SRT_Voice

Install this extension via the ComfyUI Manager by searching for ComfyUI_ChatterBox_SRT_Voice
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_ChatterBox_SRT_Voice in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

🎙️ ChatterBox Voice Capture (diogod) Description

Facilitates voice capture and processing in ComfyUI, enabling seamless recording and conversion.

🎙️ ChatterBox Voice Capture (diogod):

ChatterBoxVoiceCaptureDiogod is a sophisticated node designed to facilitate the capture and processing of voice data within the ComfyUI framework. Its primary purpose is to enable seamless voice recording and conversion, making it an essential tool for AI artists who wish to integrate voice elements into their creative projects. This node is particularly beneficial for those looking to enhance their audio content with features such as voice conversion and speaker diarization. By leveraging advanced audio analysis techniques, ChatterBoxVoiceCaptureDiogod ensures high-quality voice capture and processing, allowing users to focus on the creative aspects of their work without worrying about technical complexities. Its integration into the ComfyUI ecosystem provides a user-friendly interface that simplifies the process of voice data manipulation, making it accessible even to those with limited technical expertise.

🎙️ ChatterBox Voice Capture (diogod) Input Parameters:

tag_audio_events

This parameter allows you to annotate sounds such as laughter or music within the transcript. It is a Boolean input, meaning it can be set to either True or False. When enabled, it enhances the transcript by providing context about non-verbal audio events, which can be particularly useful for creating more engaging and informative audio content. The default value is False.

diarize

The diarize parameter is used to annotate which speaker is talking during the audio recording. This is also a Boolean input, with a default value of False. Enabling this feature allows for the separation of different speakers in the transcript, which is crucial for projects involving multiple voices or characters. It helps in maintaining clarity and understanding in dialogues or interviews.

diarization_threshold

This parameter controls the sensitivity of speaker separation. It is a Float input with a default value of 0.22, and it can range from 0.1 to 0.4. The threshold determines how sensitive the system is to changes in speakers; lower values make it more sensitive, which can be useful in environments with frequent speaker changes. Adjusting this parameter allows for fine-tuning the balance between sensitivity and accuracy in speaker identification.

🎙️ ChatterBox Voice Capture (diogod) Output Parameters:

transcript

The transcript output provides a text representation of the recorded audio, including any annotations for audio events and speaker diarization if those features are enabled. This output is crucial for users who need a textual version of their audio content for further processing or analysis.

audio_segments

This output consists of segmented audio data, which can be used for detailed analysis or further processing. Each segment corresponds to a portion of the audio that has been identified as distinct, either by speaker or by audio event, depending on the input parameters set.

🎙️ ChatterBox Voice Capture (diogod) Usage Tips:

  • To optimize the node for projects with multiple speakers, enable the diarize parameter and adjust the diarization_threshold to a lower value for environments with frequent speaker changes.
  • Use the tag_audio_events parameter to enhance transcripts with contextual information about non-verbal sounds, which can improve the overall quality and engagement of your audio content.

🎙️ ChatterBox Voice Capture (diogod) Common Errors and Solutions:

"Audio input not detected"

  • Explanation: This error occurs when the node fails to detect any audio input, possibly due to incorrect microphone settings or permissions.
  • Solution: Ensure that your microphone is properly connected and configured in your system settings. Check that the application has the necessary permissions to access the microphone.

"Diarization threshold out of range"

  • Explanation: This error indicates that the diarization_threshold value is set outside the acceptable range of 0.1 to 0.4.
  • Solution: Adjust the diarization_threshold parameter to a value within the specified range to ensure proper speaker separation functionality.

🎙️ ChatterBox Voice Capture (diogod) Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_ChatterBox_SRT_Voice
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.