Visit ComfyUI Online for ready-to-use ComfyUI environment
Synchronize audio signals with precision using cross-correlation techniques for accurate alignment in audio processing tasks.
The Audio Align (XCorr) node is designed to synchronize two audio signals by aligning them in time. This is particularly useful in audio processing tasks where precise timing between reference and processed audio is crucial, such as in audio restoration, enhancement, or analysis. The node employs cross-correlation techniques, specifically the Generalized Cross-Correlation with Phase Transform (GCC-PHAT), to estimate the time delay between the reference and processed audio signals. By accurately determining this delay, the node can adjust the processed audio to match the timing of the reference audio, ensuring that both signals are perfectly aligned. This alignment is essential for subsequent audio processing tasks, such as gain matching and null testing, which rely on synchronized audio inputs to produce accurate results. The node's ability to handle fractional delays further enhances its precision, making it a valuable tool for audio engineers and AI artists working with complex audio datasets.
This parameter represents the reference audio signal to which the processed audio will be aligned. It serves as the benchmark for synchronization, and its sample rate and waveform characteristics are crucial for accurate alignment.
This parameter is the processed audio signal that needs to be aligned with the reference audio. The node will adjust this audio's timing to match that of the reference audio, ensuring synchronization.
This parameter defines the maximum allowable time shift, in milliseconds, for aligning the audio signals. It sets a boundary for the alignment process, preventing excessive shifts that could lead to misalignment. The default value is 200 ms.
This parameter specifies the method used for alignment, with "gcc-phat" being the default. This method is known for its robustness in estimating time delays between audio signals, making it suitable for various audio processing tasks.
This boolean parameter determines whether fractional delays should be considered during alignment. Enabling this option allows for more precise alignment by accounting for sub-sample delays. The default value is True.
This parameter sets the length of the Finite Impulse Response (FIR) filter used for fractional delay processing. A longer FIR filter can provide more accurate delay adjustments but may increase computational complexity. The default value is 64.
This output is the processed audio signal after alignment. It is synchronized with the reference audio, ensuring that both signals are in perfect temporal alignment.
This output represents the calculated delay, in samples, between the reference and processed audio signals. It indicates how much the processed audio was shifted to achieve alignment.
This output provides the calculated delay in milliseconds, offering a more intuitive understanding of the time shift applied during alignment.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.