Save 4 hours! We auto-setup your workflow! Free!

Drop your workflow.json — we handle every dependency, custom node, and model. Just open the link and run.

Auto-Setup Workflow Json (Free) Now!
ComfyUI > Nodes > ComfyUI-FL-VoxCPM > FL VoxCPM Transcribe

ComfyUI Node: FL VoxCPM Transcribe

Class Name

FL_VoxCPM_Transcribe

Category
FL/VoxCPM
Author
filliptm (Account age: 2446days)
Extension
ComfyUI-FL-VoxCPM
Latest Updated
2026-05-21
Github Stars
0.03K

How to Install ComfyUI-FL-VoxCPM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FL-VoxCPM
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-FL-VoxCPM in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

FL VoxCPM Transcribe Description

Convert spoken audio to text using Whisper model for efficient and accurate transcription in multiple languages.

FL VoxCPM Transcribe:

FL_VoxCPM_Transcribe is a powerful node designed to convert spoken audio into text using the Whisper model, a state-of-the-art speech recognition system. This node is particularly beneficial for AI artists and developers who need to transcribe audio content efficiently and accurately. By leveraging the capabilities of Whisper, FL_VoxCPM_Transcribe can handle various audio inputs and produce high-quality transcriptions in multiple languages. The node is designed to be user-friendly, automatically selecting the optimal processing device (CPU, GPU, or MPS) to ensure smooth and efficient operation. Its integration with the ComfyUI framework allows for seamless audio processing, making it an essential tool for projects that require precise and reliable audio-to-text conversion.

FL VoxCPM Transcribe Input Parameters:

audio

The audio parameter is the input audio data that you wish to transcribe. It is crucial for the node's operation as it provides the raw audio content that will be converted into text. The audio should be in a format compatible with the node's processing capabilities, typically as a waveform tensor. There are no explicit minimum or maximum values for this parameter, but the quality and clarity of the audio can significantly impact the accuracy of the transcription.

model

The model parameter specifies which Whisper model to use for transcription. Available options include various versions of the Whisper model, such as "openai/whisper-large-v3-turbo" and "openai/whisper-tiny". The choice of model affects the transcription's accuracy and speed, with larger models generally providing more accurate results at the cost of increased computational resources. There is no default value, so you must select a model based on your specific needs and available resources.

language

The language parameter allows you to specify the language of the audio content. If set to "auto", the node will attempt to automatically detect the language. Specifying the language can improve transcription accuracy, especially for non-English audio. There are no explicit minimum or maximum values, but the parameter should be set to a valid language code if not using the auto-detect feature.

device

The device parameter determines the hardware on which the transcription process will run. By default, it is set to "auto", allowing the node to choose the best available device, such as a GPU (CUDA), MPS, or CPU. This parameter ensures that the node operates efficiently by utilizing the most suitable hardware resources available.

FL VoxCPM Transcribe Output Parameters:

transcription

The transcription output parameter provides the text result of the audio transcription process. It is the primary output of the node, representing the spoken content of the input audio in written form. This output is crucial for applications that require text analysis or further processing of audio content. The transcription is returned as a string, with special tokens removed to ensure clarity and readability.

FL VoxCPM Transcribe Usage Tips:

  • Ensure your audio input is clear and free from excessive background noise to improve transcription accuracy.
  • Choose the appropriate Whisper model based on your resource availability and accuracy requirements; larger models offer better accuracy but require more computational power.
  • Specify the language of the audio if known, as this can enhance the transcription quality, especially for non-English content.
  • Allow the node to automatically select the processing device unless you have specific hardware preferences or constraints.

FL VoxCPM Transcribe Common Errors and Solutions:

"transformers library required for transcription"

  • Explanation: This error occurs when the transformers library is not installed, which is necessary for the node to function.
  • Solution: Install the transformers library using the command pip install transformers.

"Resampling from <sr>Hz to 16000Hz"

  • Explanation: This message indicates that the input audio sample rate does not match the required 16000Hz and is being resampled.
  • Solution: Ensure your audio input is already at 16000Hz to avoid unnecessary resampling, which can save processing time.

"Loading Whisper model: <model> on <device>"

  • Explanation: This message appears when the specified Whisper model is being loaded onto the selected device.
  • Solution: If loading takes too long, consider using a smaller model or ensuring your device has sufficient resources.

"Using cached Whisper model"

  • Explanation: This indicates that a previously loaded model is being reused from cache, which speeds up processing.
  • Solution: No action needed; this is an optimization feature to enhance performance.

FL VoxCPM Transcribe Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-FL-VoxCPM
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

FL VoxCPM Transcribe