RunComfy

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

Wan 2.2 | Open-Source Video Gen Leader

Available now! Better precision + smoother motion.

Controllable Animation in AI Video | Motion Control Tool

Make videos obey your motion rules instantly and precisely.

Image Bypass | Smart Image Detection Bypass Utility Workflow

Skip limits and process images faster with total creative control.

ComfyUI > Nodes > ComfyUI-FL-VoxCPM > FL VoxCPM Transcribe

ComfyUI Node: FL VoxCPM Transcribe

Class Name

FL_VoxCPM_Transcribe

Category
FL/VoxCPM

Author
filliptm (Account age: 2446days) Extension
ComfyUI-FL-VoxCPM Latest Updated
2026-05-21 Github Stars
0.03K

Github Ask filliptm Current Questions Past Questions

Table of Content

Description
FL_VoxCPM_Transcribe:
FL_VoxCPM_Transcribe Input Parameters:
FL_VoxCPM_Transcribe Output Parameters:
FL_VoxCPM_Transcribe Usage Tips:
FL_VoxCPM_Transcribe Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-FL-VoxCPM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FL-VoxCPM

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-FL-VoxCPM in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

FL VoxCPM Transcribe Description

Convert spoken audio to text using Whisper model for efficient and accurate transcription in multiple languages.

FL VoxCPM Transcribe:

FL_VoxCPM_Transcribe is a powerful node designed to convert spoken audio into text using the Whisper model, a state-of-the-art speech recognition system. This node is particularly beneficial for AI artists and developers who need to transcribe audio content efficiently and accurately. By leveraging the capabilities of Whisper, FL_VoxCPM_Transcribe can handle various audio inputs and produce high-quality transcriptions in multiple languages. The node is designed to be user-friendly, automatically selecting the optimal processing device (CPU, GPU, or MPS) to ensure smooth and efficient operation. Its integration with the ComfyUI framework allows for seamless audio processing, making it an essential tool for projects that require precise and reliable audio-to-text conversion.

FL VoxCPM Transcribe Input Parameters:

audio

The audio parameter is the input audio data that you wish to transcribe. It is crucial for the node's operation as it provides the raw audio content that will be converted into text. The audio should be in a format compatible with the node's processing capabilities, typically as a waveform tensor. There are no explicit minimum or maximum values for this parameter, but the quality and clarity of the audio can significantly impact the accuracy of the transcription.

model

The model parameter specifies which Whisper model to use for transcription. Available options include various versions of the Whisper model, such as "openai/whisper-large-v3-turbo" and "openai/whisper-tiny". The choice of model affects the transcription's accuracy and speed, with larger models generally providing more accurate results at the cost of increased computational resources. There is no default value, so you must select a model based on your specific needs and available resources.

language

The language parameter allows you to specify the language of the audio content. If set to "auto", the node will attempt to automatically detect the language. Specifying the language can improve transcription accuracy, especially for non-English audio. There are no explicit minimum or maximum values, but the parameter should be set to a valid language code if not using the auto-detect feature.

device

The device parameter determines the hardware on which the transcription process will run. By default, it is set to "auto", allowing the node to choose the best available device, such as a GPU (CUDA), MPS, or CPU. This parameter ensures that the node operates efficiently by utilizing the most suitable hardware resources available.

FL VoxCPM Transcribe Output Parameters:

transcription

The transcription output parameter provides the text result of the audio transcription process. It is the primary output of the node, representing the spoken content of the input audio in written form. This output is crucial for applications that require text analysis or further processing of audio content. The transcription is returned as a string, with special tokens removed to ensure clarity and readability.

FL VoxCPM Transcribe Usage Tips:

Ensure your audio input is clear and free from excessive background noise to improve transcription accuracy.
Choose the appropriate Whisper model based on your resource availability and accuracy requirements; larger models offer better accuracy but require more computational power.
Specify the language of the audio if known, as this can enhance the transcription quality, especially for non-English content.
Allow the node to automatically select the processing device unless you have specific hardware preferences or constraints.

FL VoxCPM Transcribe Common Errors and Solutions:

"transformers library required for transcription"

Explanation: This error occurs when the transformers library is not installed, which is necessary for the node to function.
Solution: Install the transformers library using the command pip install transformers.

"Resampling from `<sr>`Hz to 16000Hz"

Explanation: This message indicates that the input audio sample rate does not match the required 16000Hz and is being resampled.
Solution: Ensure your audio input is already at 16000Hz to avoid unnecessary resampling, which can save processing time.

"Loading Whisper model: `<model>` on `<device>`"

Explanation: This message appears when the specified Whisper model is being loaded onto the selected device.
Solution: If loading takes too long, consider using a smaller model or ensuring your device has sufficient resources.

"Using cached Whisper model"

Explanation: This indicates that a previously loaded model is being reused from cache, which speeds up processing.
Solution: No action needed; this is an optimization feature to enhance performance.

FL VoxCPM Transcribe Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-FL-VoxCPM

Table of Content

Description
FL_VoxCPM_Transcribe:
FL_VoxCPM_Transcribe Input Parameters:
FL_VoxCPM_Transcribe Output Parameters:
FL_VoxCPM_Transcribe Usage Tips:
FL_VoxCPM_Transcribe Common Errors and Solutions:
Related Nodes

LTX-2 ComfyUI | Real-Time Video Generator

Create real-time videos instantly, faster than any other generator.

Wan2.1 Stand In | Consistent Character Video Maker

Keeps characters consistent across video from just one reference image.

Z-Image De-Turbo LoRA Inference | AI Toolkit ComfyUI

Run your AI Toolkit-trained Z-Image De-Turbo LoRA in ComfyUI with training-matched behavior using a single RCZimageDeturbo custom node.

FLUX LoRA (RealismLoRA) | Photorealistic Images

Blend FLUX-1 model with FLUX-RealismLoRA for photorealistic AI images

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: FL VoxCPM Transcribe

FL_VoxCPM_Transcribe

How to Install ComfyUI-FL-VoxCPM

FL VoxCPM Transcribe Description

FL VoxCPM Transcribe:

FL VoxCPM Transcribe Input Parameters:

audio

model

language

device

FL VoxCPM Transcribe Output Parameters:

transcription

FL VoxCPM Transcribe Usage Tips:

FL VoxCPM Transcribe Common Errors and Solutions:

"transformers library required for transcription"

"Resampling from <sr>Hz to 16000Hz"

"Loading Whisper model: <model> on <device>"

"Using cached Whisper model"

FL VoxCPM Transcribe Related Nodes

"Resampling from `<sr>`Hz to 16000Hz"

"Loading Whisper model: `<model>` on `<device>`"