ComfyUI > Nodes > ComfyUI-OmniVoice-TTS > OmniVoice Whisper Loader

ComfyUI Node: OmniVoice Whisper Loader

Class Name

OmniVoiceWhisperLoader

Category
OmniVoice
Author
Saganaki22 (Account age: 1788days)
Extension
ComfyUI-OmniVoice-TTS
Latest Updated
2026-04-07
Github Stars
0.19K

How to Install ComfyUI-OmniVoice-TTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-OmniVoice-TTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-OmniVoice-TTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

OmniVoice Whisper Loader Description

Facilitates loading and managing Whisper ASR models for efficient transcription in OmniVoice.

OmniVoice Whisper Loader:

The OmniVoiceWhisperLoader is a specialized node designed to facilitate the loading and management of Whisper Automatic Speech Recognition (ASR) models within the OmniVoice framework. Its primary purpose is to streamline the process of integrating Whisper models for automatic transcription tasks, ensuring that models are readily available for use without unnecessary re-downloads. This node is particularly beneficial for users who work with audio data and require efficient transcription capabilities. By leveraging locally stored models or automatically downloading them when needed, the OmniVoiceWhisperLoader enhances workflow efficiency and reduces setup time. It supports models compatible with HuggingFace, allowing users to place these models in a designated directory for easy access. The node's functionality is crucial for applications that demand reliable and quick transcription services, making it an essential component for AI artists and developers working with audio content.

OmniVoice Whisper Loader Input Parameters:

model

The model parameter specifies the name of the Whisper ASR model to be loaded. This can be a model name from the HuggingFace repository or a local folder name where the model is stored. The choice of model impacts the transcription accuracy and performance, as different models may have varying capabilities and resource requirements. There are no explicit minimum or maximum values, but the model name must correspond to a valid Whisper model available either locally or online.

device

The device parameter determines the hardware on which the Whisper model will be loaded and executed. It can be set to "auto", which allows the system to automatically select the most appropriate device, or it can be explicitly set to specific devices such as "cpu" or "cuda" for GPU acceleration. The choice of device affects the speed and efficiency of the transcription process, with GPUs generally offering faster performance. The default value is "auto".

dtype

The dtype parameter specifies the data type used for model computations, which can influence the precision and performance of the ASR pipeline. It can be set to "auto" to let the system decide, or to specific data types like "float32" or "float16". The choice of data type can impact the model's memory usage and speed, with lower precision types typically offering faster processing at the cost of some accuracy. The default value is "auto".

OmniVoice Whisper Loader Output Parameters:

pipeline

The pipeline output is a dictionary containing the loaded Whisper ASR pipeline. This pipeline is essential for performing automatic speech recognition tasks, as it encapsulates the model and its configuration, ready for transcription operations. The pipeline's availability ensures that audio data can be processed efficiently, providing users with transcriptions without the need for additional setup.

model_name

The model_name output provides the name of the Whisper model that has been loaded. This information is useful for tracking which model is being used for transcription, especially when multiple models are available or when comparing performance across different models.

device

The device output indicates the hardware device on which the Whisper model is running. This information is crucial for understanding the execution context and for troubleshooting performance-related issues, as it confirms whether the model is utilizing CPU or GPU resources.

dtype

The dtype output specifies the data type used for the model's computations. This detail is important for users who need to ensure that the model is operating with the desired precision and performance characteristics, particularly in environments where resource constraints are a consideration.

OmniVoice Whisper Loader Usage Tips:

  • Ensure that your desired Whisper model is either available locally in the specified directory or accessible online for automatic download to avoid interruptions in your workflow.
  • For optimal performance, especially with large audio files, consider using a GPU by setting the device parameter to "cuda" if available.
  • Experiment with different dtype settings to balance between performance and precision, particularly if you are working with resource-constrained environments.

OmniVoice Whisper Loader Common Errors and Solutions:

Whisper model not found: <model_path>

  • Explanation: This error occurs when the specified Whisper model cannot be located in the expected directory.
  • Solution: Verify that the model is correctly placed in the ComfyUI/models/audio_encoders/ directory and that the model name is correctly specified.

Failed to load local Whisper: <error_message>

  • Explanation: This error indicates that there was an issue loading a locally stored Whisper model, possibly due to file corruption or incompatible model files.
  • Solution: Check the integrity of the model files and ensure they are compatible with the Whisper ASR framework. Re-download the model if necessary.

No reference transcript — Whisper will auto-transcribe (will download if not cached)

  • Explanation: This message indicates that no pre-loaded Whisper model is available, and the system will attempt to download and cache the model for transcription.
  • Solution: Allow the download to complete, or manually place the desired model in the local directory to avoid repeated downloads.

OmniVoice Whisper Loader Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-OmniVoice-TTS
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.