Visit ComfyUI Online for ready-to-use ComfyUI environment
Evaluate audio quality using Log-Spectral Distance (LSD) and Scale-Invariant Signal-to-Distortion Ratio (SI-SDR).
The Metrics (LSD + SI-SDR) node is designed to evaluate audio quality by calculating two key metrics: Log-Spectral Distance (LSD) and Scale-Invariant Signal-to-Distortion Ratio (SI-SDR). These metrics are crucial for assessing the fidelity and clarity of audio signals, particularly in audio processing and enhancement tasks. LSD measures the difference in the spectral content between two audio signals, providing insight into how closely a processed audio matches its reference in terms of frequency content. SI-SDR, on the other hand, evaluates the quality of the audio by quantifying the distortion present in the signal, independent of its scale. This node is particularly beneficial for audio engineers and AI artists who aim to enhance audio quality, as it provides a quantitative measure of improvement or degradation in audio processing tasks.
This parameter represents the reference audio signal against which the processed audio will be compared. It is crucial for establishing a baseline to evaluate the quality of the processed audio. The reference audio should be a high-quality version of the audio you are trying to enhance or process.
This parameter is the processed audio signal that you want to evaluate. It is compared against the reference audio to determine the effectiveness of the audio processing techniques applied. The goal is to have this audio closely match the reference audio in terms of quality and clarity.
The n_fft parameter determines the number of points used in the Fast Fourier Transform (FFT) to compute the spectrogram. It affects the frequency resolution of the analysis, with a default value of 2048. The minimum value is 512, and the maximum is 8192, with a step of 128. A higher n_fft value provides better frequency resolution but may increase computational load.
The hop parameter specifies the number of samples between successive frames in the spectrogram calculation. It influences the time resolution of the analysis, with a default value of 512. The minimum value is 64, and the maximum is 4096, with a step of 64. A smaller hop size offers better time resolution but increases the number of frames to process.
This boolean parameter determines whether the Log-Spectral Distance (LSD) should be computed. It is set to True by default, indicating that LSD will be calculated to assess the spectral similarity between the reference and processed audio.
This boolean parameter indicates whether the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) should be computed. It is set to True by default, meaning that SI-SDR will be calculated to evaluate the distortion level in the processed audio relative to the reference.
The metrics output is a dictionary containing the calculated values of the specified metrics. It includes lsd_mean_db, which represents the average Log-Spectral Distance in decibels, and lsd_p95_db, which is the 95th percentile of the LSD values, providing a measure of the worst-case spectral deviation. Additionally, if SI-SDR computation is enabled, si_sdr_db is included, representing the Scale-Invariant Signal-to-Distortion Ratio in decibels, which quantifies the distortion level in the processed audio.
n_fft and hop parameters based on the specific audio characteristics and computational resources available to balance between frequency and time resolution.n_fft parameter is set outside the allowed range.n_fft value within the specified range of 512 to 8192, ensuring it is a power of two for optimal FFT performance.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.