Woosh Sampler:
WooshSample is a sophisticated node designed to generate audio content by leveraging text-to-audio (T2A) or video-to-audio (V2A) synthesis, automatically detecting the appropriate mode based on the input. This node is part of the Woosh framework, which integrates advanced audio processing techniques to produce high-quality soundscapes. The primary goal of WooshSample is to facilitate the creation of audio that aligns with textual or video prompts, making it an invaluable tool for AI artists looking to enhance their multimedia projects with custom audio. By utilizing a consolidated text encoding and a unified sampling approach, WooshSample ensures that the generated audio is coherent and contextually relevant, whether it's derived from text descriptions or synchronized with video frames.
Woosh Sampler Input Parameters:
seed
The seed parameter is an integer that determines the randomness of the noise generation process. It plays a crucial role in ensuring the reproducibility of audio outputs. By setting a specific seed value, you can generate the same audio output consistently. The default value is 0, which means a random seed is used each time, leading to different outputs on each run. The minimum value is 0, and the maximum is 0xFFFFFFFF. This parameter is particularly useful when you want to experiment with different variations of audio while maintaining control over the randomness.
latent_frames
The latent_frames parameter is an integer that controls the duration of the generated audio. It specifies the number of frames in the latent space, with 100 frames approximately equating to 1 second of audio at a 48kHz sample rate. For text-to-audio (T2A) synthesis, a value of 501 frames results in about 5 seconds of audio, while 1001 frames yield around 10 seconds. For video-to-audio (V2A) synthesis, 801 frames correspond to approximately 8 seconds. The default value is 501, with a minimum of 1 and a maximum of 2000. Adjusting this parameter allows you to tailor the audio length to fit your project's needs.
subprocess
The subprocess parameter is a boolean that determines whether the inference process should run in an isolated subprocess. This option is beneficial if the generated sound does not match the prompt, as some ComfyUI environments may alter the global PyTorch state, affecting Woosh's output. Running in a subprocess ensures the correctness of the output, albeit at the cost of slower performance due to a model reload time of approximately 15 seconds. The default value is True, indicating that subprocess execution is enabled by default.
force_offload
The force_offload parameter is a boolean that dictates whether the model should be removed from GPU and CPU RAM after sampling. Enabling this option forces the model to reload from disk on the next run, which can be useful for managing memory resources in environments with limited hardware capabilities. The default value is False, meaning the model remains in memory for faster subsequent executions unless explicitly offloaded.
Woosh Sampler Output Parameters:
video_frames
The video_frames output consists of the frames extracted from the input video when using the video-to-audio (V2A) mode. These frames are crucial for synchronizing the generated audio with the visual content, ensuring that the audio complements the video seamlessly. This output is particularly important for projects that require precise audio-visual alignment, such as multimedia presentations or video art installations.
audio
The audio output is the generated sound that results from the text-to-audio (T2A) or video-to-audio (V2A) synthesis process. This audio is crafted to match the input text or video context, providing a rich auditory experience that enhances the overall impact of your project. The audio is normalized to ensure consistent volume levels, making it ready for immediate use in various applications.
Woosh Sampler Usage Tips:
- To achieve consistent audio outputs, set a specific
seedvalue. This allows you to reproduce the same results across different sessions. - Adjust the
latent_framesparameter to control the duration of the audio. For longer audio, increase the number of frames, keeping in mind the approximate conversion of 100 frames per second. - If you encounter discrepancies in the generated audio, enable the
subprocessoption to ensure the output matches the prompt accurately. - Use the
force_offloadoption to manage memory usage effectively, especially in environments with limited GPU or CPU resources.
Woosh Sampler Common Errors and Solutions:
"CUDA out of memory"
- Explanation: This error occurs when the GPU does not have enough memory to accommodate the model and the data being processed.
- Solution: Try reducing the
latent_framesparameter to decrease the memory load or enable theforce_offloadoption to free up memory after each run.
"Inconsistent audio output"
- Explanation: The generated audio does not match the expected output due to modifications in the global PyTorch state by the ComfyUI environment.
- Solution: Enable the
subprocessoption to run the inference in an isolated environment, ensuring the correctness of the output.
"Model reload time is too long"
- Explanation: Running the inference in a subprocess can lead to longer model reload times, affecting performance.
- Solution: If performance is a priority and the audio output is consistent, consider disabling the
subprocessoption to speed up the process.
