RunComfy

Flux Klein Face Swap | Realistic AI Face Editor

Swap faces perfectly. Natural, lifelike, and fast AI-powered editing.

FLUX Kontext Dev | Intelligent Image Editing

Kontext Dev = Controllable + All Graphic Design Needs in One Tool

Flux Consistent Characters | Input Image

Create consistent characters and ensure they look uniform using your images.

LongCat Avatar in ComfyUI | Identity-Consistent Avatar Animation

Turns one image into smooth, identity-consistent avatar animation.

ComfyUI > Nodes > ComfyUI-FLOAT_Optimized > FLOAT Audio Feature Extract (VA)

ComfyUI Node: FLOAT Audio Feature Extract (VA)

Class Name

FloatAudioPreprocessAndFeatureExtract

Category
FLOAT/Very Advanced

Author
set-soft (Account age: 3450days) Extension
ComfyUI-FLOAT_Optimized Latest Updated
2026-03-20 Github Stars
0.03K

Github Ask set-soft Current Questions Past Questions

Table of Content

Description
FloatAudioPreprocessAndFeatureExtract:
FloatAudioPreprocessAndFeatureExtract Input Parameters:
FloatAudioPreprocessAndFeatureExtract Output Parameters:
FloatAudioPreprocessAndFeatureExtract Usage Tips:
FloatAudioPreprocessAndFeatureExtract Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-FLOAT_Optimized

Install this extension via the ComfyUI Manager by searching for ComfyUI-FLOAT_Optimized

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-FLOAT_Optimized in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

FLOAT Audio Feature Extract (VA) Description

Processes audio to extract features for synchronization with video using Wav2Vec model.

FLOAT Audio Feature Extract (VA):

The FloatAudioPreprocessAndFeatureExtract node is designed to process audio data and extract meaningful features that can be used in conjunction with video data. This node is particularly useful for AI artists who work with audio-visual projects, as it ensures that audio features are synchronized with video frames. The node processes a batch of pre-validated audio clips, ensuring they are in mono format and have the correct sample rate. It utilizes the Wav2Vec model to extract features from the audio, which are then interpolated to match the target frames per second (FPS) of the video. This synchronization is crucial for applications where audio and video need to be aligned, such as in lip-syncing or audio-driven animation. By automating the feature extraction and synchronization process, this node simplifies the workflow for artists, allowing them to focus on creative aspects rather than technical details.

FLOAT Audio Feature Extract (VA) Input Parameters:

audio

The audio parameter is a dictionary that contains the waveform and sample rate of the audio data to be processed. It is crucial for ensuring that the audio is in the correct format before feature extraction. The waveform should be a PyTorch tensor, and the sample rate should match the expected sample rate for the Wav2Vec model. This parameter ensures that the audio data is correctly formatted and ready for processing, which is essential for accurate feature extraction.

wav2vec_pipe

The wav2vec_pipe parameter is a tuple that includes the Wav2Vec model and its associated feature extractor. This parameter is essential for the feature extraction process, as it provides the necessary tools to analyze the audio data and extract meaningful features. The Wav2Vec model is a powerful tool for audio analysis, and this parameter ensures that it is correctly configured and ready to use.

target_fps

The target_fps parameter specifies the frames per second of the target video. This parameter is crucial for synchronizing the extracted audio features with the video frames. By interpolating the audio features to match the target FPS, this parameter ensures that the audio and video are perfectly aligned, which is essential for applications like lip-syncing or audio-driven animation.

only_last_features

The only_last_features parameter is a boolean that determines whether only the last set of features should be extracted from the audio data. This parameter can be useful for applications where only the most recent audio features are needed, such as in real-time audio processing. By setting this parameter to True, you can reduce the computational load and focus on the most relevant features.

FLOAT Audio Feature Extract (VA) Output Parameters:

wav2vec_features_gpu

The wav2vec_features_gpu output parameter is a tensor containing the extracted audio features. These features are processed on the GPU for efficiency and are crucial for synchronizing audio with video. The features can be used in various applications, such as audio-driven animation or lip-syncing, where precise audio-visual alignment is required.

audio_num_frames

The audio_num_frames output parameter indicates the number of frames in the audio data after processing. This parameter is essential for understanding how the audio data has been synchronized with the video frames. It provides a clear indication of the alignment between audio and video, which is crucial for applications that require precise synchronization.

preprocessed_audio_batched_cpu

The preprocessed_audio_batched_cpu output parameter is a tensor containing the preprocessed audio data. This data is processed on the CPU and is ready for further analysis or processing. It provides a baseline for understanding how the audio data has been transformed during preprocessing, which can be useful for debugging or further analysis.

wav2vec_pipe

The wav2vec_pipe output parameter is the same as the input parameter, providing the Wav2Vec model and feature extractor used in the process. This output ensures that the same tools are available for further processing or analysis, maintaining consistency throughout the workflow.

audio

The audio output parameter is the same as the input parameter, providing the original audio data for reference. This output ensures that the original data is preserved and available for further analysis or comparison.

target_fps

The target_fps output parameter is the same as the input parameter, providing the target frames per second for the video. This output ensures that the synchronization settings are preserved and available for further processing or analysis.

FLOAT Audio Feature Extract (VA) Usage Tips:

Ensure that your audio data is in mono format and has the correct sample rate before processing to avoid errors and ensure accurate feature extraction.
Use the only_last_features parameter to reduce computational load if you only need the most recent audio features for real-time applications.
Verify that the target_fps matches the frame rate of your video to ensure proper synchronization between audio and video.

FLOAT Audio Feature Extract (VA) Common Errors and Solutions:

Input 'processed_audio_features' must be a torch.Tensor

Explanation: This error occurs when the input audio features are not provided as a PyTorch tensor.
Solution: Ensure that the audio features are converted to a PyTorch tensor before inputting them into the node.

Input 'audio' must be a ComfyUI AUDIO dictionary

Explanation: This error indicates that the audio input is not in the expected dictionary format.
Solution: Format your audio input as a dictionary containing the waveform and sample rate, ensuring that the waveform is a PyTorch tensor.

audio['waveform'] must be 2D or 3D

Explanation: This error occurs when the waveform data is not in the correct dimensional format.
Solution: Ensure that the waveform data is either 2D or 3D. If it is 2D, it should be unsqueezed to 3D for batch processing.

FLOAT Audio Feature Extract (VA) Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-FLOAT_Optimized

Table of Content

Description
FloatAudioPreprocessAndFeatureExtract:
FloatAudioPreprocessAndFeatureExtract Input Parameters:
FloatAudioPreprocessAndFeatureExtract Output Parameters:
FloatAudioPreprocessAndFeatureExtract Usage Tips:
FloatAudioPreprocessAndFeatureExtract Common Errors and Solutions:
Related Nodes

AnimateDiff + ControlNet + IPAdapter V1 | Cartoon Style

Convert the original video into the desired animation by using only a few images to define the preferred style.

Qwen-Image Lightning | 8-Step Speed Boost

Cut generation time in half.

Wan 2.2 Image Generation | 2-in-1 Workflow Pack

MoE Mix + Low-Only with upscale. Pick one.

Flux Fill | Inpaint and Outpaint

Official Flux Tools - Flux Fill for Inpainting and Outpainting

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: FLOAT Audio Feature Extract (VA)

FloatAudioPreprocessAndFeatureExtract

How to Install ComfyUI-FLOAT_Optimized

FLOAT Audio Feature Extract (VA) Description

FLOAT Audio Feature Extract (VA):

FLOAT Audio Feature Extract (VA) Input Parameters:

audio

wav2vec_pipe

target_fps

only_last_features

FLOAT Audio Feature Extract (VA) Output Parameters:

wav2vec_features_gpu

audio_num_frames

preprocessed_audio_batched_cpu

wav2vec_pipe

audio

target_fps

FLOAT Audio Feature Extract (VA) Usage Tips:

FLOAT Audio Feature Extract (VA) Common Errors and Solutions:

Input 'processed_audio_features' must be a torch.Tensor

Input 'audio' must be a ComfyUI AUDIO dictionary

audio['waveform'] must be 2D or 3D

FLOAT Audio Feature Extract (VA) Related Nodes