ComfyUI > Nodes > ComfyUI-AudioSR > AudioSR

ComfyUI Node: AudioSR

Class Name

AudioSR

Category
audio
Author
Saganaki22 (Account age: 0days)
Extension
ComfyUI-AudioSR
Latest Updated
2026-03-21
Github Stars
0.07K

How to Install ComfyUI-AudioSR

Install this extension via the ComfyUI Manager by searching for ComfyUI-AudioSR
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-AudioSR in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

AudioSR Description

Enhances audio to 48kHz using AudioSR model, improving clarity and detail for high-quality use.

AudioSR:

The AudioSR node is designed to enhance audio quality by upscaling it to a 48kHz sampling rate using the Versatile Audio Super Resolution (AudioSR) latent diffusion model. This node is particularly beneficial for improving the clarity and detail of audio files, making them suitable for high-quality applications. The process involves several steps, including diffusion, chunking, and stereo processing, which collectively contribute to the node's ability to reconstruct and denoise audio effectively. By splitting audio longer than 10.24 seconds into manageable chunks and processing stereo channels separately, the node ensures comprehensive enhancement of each audio segment. This meticulous approach, although computationally intensive, results in significantly improved audio quality, making it an essential tool for AI artists and audio professionals seeking to elevate their audio projects.

AudioSR Input Parameters:

audio

The audio parameter is the primary input for the AudioSR node, accepting either a dictionary with keys waveform and sample_rate or a tuple containing the waveform and sample rate. This parameter represents the audio data to be processed, and it is crucial for the node's operation as it determines the initial quality and characteristics of the audio that will be upscaled. The waveform should be a numpy array or a torch tensor, and the sample rate should be an integer, typically less than 48kHz if resampling is needed. The node will resample the audio to 48kHz if the original sample rate differs, ensuring compatibility with the AudioSR model.

seed

The seed parameter is used to initialize the random number generator, ensuring reproducibility of the audio processing results. If set to 0, a random seed is generated, which can lead to different outputs on each run. This parameter is important for users who wish to achieve consistent results across multiple runs of the node. The seed value should be an integer, with a typical range from 0 to 2^32

  • 1.

guidance_scale

The guidance_scale parameter influences the strength of the guidance applied during the diffusion process. It controls how closely the output audio adheres to the model's learned patterns versus the input audio characteristics. A higher guidance scale can lead to more pronounced enhancements but may also introduce artifacts if set too high. This parameter is a float, with typical values ranging from 0.0 to 10.0, depending on the desired level of enhancement.

ddim_steps

The ddim_steps parameter specifies the number of diffusion steps used in the denoising and reconstruction process. More steps generally lead to higher quality results but increase processing time. This parameter is an integer, with a default value of 50, and can be adjusted based on the desired balance between quality and performance.

chunk_size

The chunk_size parameter determines the length of audio chunks in seconds when processing audio longer than 10.24 seconds. This parameter is crucial for managing memory and computational load, as it allows the node to process large audio files in smaller, more manageable segments. The chunk size should be a float, typically set to 15 seconds, but can be adjusted based on the available resources and desired processing speed.

overlap

The overlap parameter defines the amount of overlap between consecutive audio chunks, expressed in seconds. Overlapping helps to ensure smooth transitions between processed chunks, reducing potential artifacts at chunk boundaries. This parameter is a float, with typical values ranging from 0.0 to 5.0 seconds, depending on the desired level of overlap and the characteristics of the input audio.

AudioSR Output Parameters:

processed_audio

The processed_audio parameter is the primary output of the AudioSR node, representing the upscaled audio waveform at a 48kHz sampling rate. This output is a numpy array or torch tensor, depending on the input format, and reflects the enhanced audio quality achieved through the node's processing steps. The processed audio is suitable for high-quality applications, offering improved clarity and detail compared to the original input.

spectrogram_comparison

The spectrogram_comparison parameter provides a visual comparison between the original and processed audio spectrograms. This output is useful for users who wish to analyze the differences in frequency content and detail before and after processing. The spectrogram comparison helps to illustrate the effectiveness of the AudioSR node in enhancing audio quality.

AudioSR Usage Tips:

  • Ensure your input audio is in a compatible format, either as a dictionary or tuple, to avoid errors during processing.
  • Adjust the guidance_scale and ddim_steps parameters to find the optimal balance between audio quality and processing time for your specific project.
  • Use the chunk_size and overlap parameters to manage memory usage and ensure smooth transitions between audio chunks, especially for longer audio files.

AudioSR Common Errors and Solutions:

Audio input is a filename string

  • Explanation: This error occurs when the input audio is provided as a filename string instead of actual audio data.
  • Solution: Ensure that the input to the AudioSR node is either a dictionary with waveform and sample_rate keys or a tuple containing the waveform and sample rate.

Audio waveform must be a torch.Tensor or numpy array

  • Explanation: This error indicates that the input audio waveform is not in the expected format.
  • Solution: Convert your audio waveform to a numpy array or torch tensor before passing it to the AudioSR node.

CUDA device not available

  • Explanation: This error occurs when the node attempts to use a CUDA device for processing, but none is available.
  • Solution: Ensure that your system has a compatible GPU with CUDA support, or modify the node to use CPU processing if necessary.

AudioSR Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-AudioSR
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

AudioSR