ComfyUI > Nodes > ComfyUI-FLOAT_Optimized > FLOAT Audio Feature Extract (VA)

ComfyUI Node: FLOAT Audio Feature Extract (VA)

Class Name

FloatAudioPreprocessAndFeatureExtract

Category
FLOAT/Very Advanced
Author
set-soft (Account age: 3450days)
Extension
ComfyUI-FLOAT_Optimized
Latest Updated
2026-03-20
Github Stars
0.03K

How to Install ComfyUI-FLOAT_Optimized

Install this extension via the ComfyUI Manager by searching for ComfyUI-FLOAT_Optimized
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-FLOAT_Optimized in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

FLOAT Audio Feature Extract (VA) Description

Processes audio to extract features for synchronization with video using Wav2Vec model.

FLOAT Audio Feature Extract (VA):

The FloatAudioPreprocessAndFeatureExtract node is designed to process audio data and extract meaningful features that can be used in conjunction with video data. This node is particularly useful for AI artists who work with audio-visual projects, as it ensures that audio features are synchronized with video frames. The node processes a batch of pre-validated audio clips, ensuring they are in mono format and have the correct sample rate. It utilizes the Wav2Vec model to extract features from the audio, which are then interpolated to match the target frames per second (FPS) of the video. This synchronization is crucial for applications where audio and video need to be aligned, such as in lip-syncing or audio-driven animation. By automating the feature extraction and synchronization process, this node simplifies the workflow for artists, allowing them to focus on creative aspects rather than technical details.

FLOAT Audio Feature Extract (VA) Input Parameters:

audio

The audio parameter is a dictionary that contains the waveform and sample rate of the audio data to be processed. It is crucial for ensuring that the audio is in the correct format before feature extraction. The waveform should be a PyTorch tensor, and the sample rate should match the expected sample rate for the Wav2Vec model. This parameter ensures that the audio data is correctly formatted and ready for processing, which is essential for accurate feature extraction.

wav2vec_pipe

The wav2vec_pipe parameter is a tuple that includes the Wav2Vec model and its associated feature extractor. This parameter is essential for the feature extraction process, as it provides the necessary tools to analyze the audio data and extract meaningful features. The Wav2Vec model is a powerful tool for audio analysis, and this parameter ensures that it is correctly configured and ready to use.

target_fps

The target_fps parameter specifies the frames per second of the target video. This parameter is crucial for synchronizing the extracted audio features with the video frames. By interpolating the audio features to match the target FPS, this parameter ensures that the audio and video are perfectly aligned, which is essential for applications like lip-syncing or audio-driven animation.

only_last_features

The only_last_features parameter is a boolean that determines whether only the last set of features should be extracted from the audio data. This parameter can be useful for applications where only the most recent audio features are needed, such as in real-time audio processing. By setting this parameter to True, you can reduce the computational load and focus on the most relevant features.

FLOAT Audio Feature Extract (VA) Output Parameters:

wav2vec_features_gpu

The wav2vec_features_gpu output parameter is a tensor containing the extracted audio features. These features are processed on the GPU for efficiency and are crucial for synchronizing audio with video. The features can be used in various applications, such as audio-driven animation or lip-syncing, where precise audio-visual alignment is required.

audio_num_frames

The audio_num_frames output parameter indicates the number of frames in the audio data after processing. This parameter is essential for understanding how the audio data has been synchronized with the video frames. It provides a clear indication of the alignment between audio and video, which is crucial for applications that require precise synchronization.

preprocessed_audio_batched_cpu

The preprocessed_audio_batched_cpu output parameter is a tensor containing the preprocessed audio data. This data is processed on the CPU and is ready for further analysis or processing. It provides a baseline for understanding how the audio data has been transformed during preprocessing, which can be useful for debugging or further analysis.

wav2vec_pipe

The wav2vec_pipe output parameter is the same as the input parameter, providing the Wav2Vec model and feature extractor used in the process. This output ensures that the same tools are available for further processing or analysis, maintaining consistency throughout the workflow.

audio

The audio output parameter is the same as the input parameter, providing the original audio data for reference. This output ensures that the original data is preserved and available for further analysis or comparison.

target_fps

The target_fps output parameter is the same as the input parameter, providing the target frames per second for the video. This output ensures that the synchronization settings are preserved and available for further processing or analysis.

FLOAT Audio Feature Extract (VA) Usage Tips:

  • Ensure that your audio data is in mono format and has the correct sample rate before processing to avoid errors and ensure accurate feature extraction.
  • Use the only_last_features parameter to reduce computational load if you only need the most recent audio features for real-time applications.
  • Verify that the target_fps matches the frame rate of your video to ensure proper synchronization between audio and video.

FLOAT Audio Feature Extract (VA) Common Errors and Solutions:

Input 'processed_audio_features' must be a torch.Tensor

  • Explanation: This error occurs when the input audio features are not provided as a PyTorch tensor.
  • Solution: Ensure that the audio features are converted to a PyTorch tensor before inputting them into the node.

Input 'audio' must be a ComfyUI AUDIO dictionary

  • Explanation: This error indicates that the audio input is not in the expected dictionary format.
  • Solution: Format your audio input as a dictionary containing the waveform and sample rate, ensuring that the waveform is a PyTorch tensor.

audio['waveform'] must be 2D or 3D

  • Explanation: This error occurs when the waveform data is not in the correct dimensional format.
  • Solution: Ensure that the waveform data is either 2D or 3D. If it is 2D, it should be unsqueezed to 3D for batch processing.

FLOAT Audio Feature Extract (VA) Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-FLOAT_Optimized
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

FLOAT Audio Feature Extract (VA)