Hunyuan-Foley BlockSwap Settings:
The HunyuanBlockSwap node is designed to optimize the memory usage of transformer models by offloading certain transformer blocks from the GPU to the CPU. This process, known as block swapping, is particularly beneficial for managing VRAM (Video Random Access Memory) consumption, allowing you to run larger models or multiple models simultaneously without exceeding your GPU's memory limits. By strategically moving some of the 57 transformer blocks (comprising 19 triple-stream and 38 single-stream blocks) to the CPU, the node helps in balancing the computational load between the CPU and GPU, thus enhancing the overall efficiency of model execution. This node is especially useful in scenarios where GPU memory is a bottleneck, enabling smoother and more efficient processing of audio tasks within the HunyuanFoley framework.
Hunyuan-Foley BlockSwap Settings Input Parameters:
blocks_to_swap
This parameter specifies the number of transformer blocks to offload from the GPU to the CPU. The model contains a total of 57 blocks, and you can choose to offload between 0 and 57 blocks. The default value is 30, which provides a balanced approach to memory management. By adjusting this parameter, you can control the amount of VRAM used by the model, with higher values leading to more blocks being offloaded to the CPU, thus reducing GPU memory usage.
use_non_blocking
This boolean parameter determines whether non-blocking memory transfer is used during the offloading process. When set to True, it can potentially speed up the memory transfer process by allowing operations to proceed without waiting for the transfer to complete, but it may also reserve more RAM. The default value is False, which ensures that memory transfers are blocking and may be more stable in terms of memory usage.
prefetch_blocks
This parameter defines the number of blocks to prefetch to the GPU ahead of time, with a range from 0 to 10 and a default value of 1. Prefetching blocks can help hide data transfer latency by preparing blocks in advance, thus improving the efficiency of the block swapping process. Adjusting this parameter can optimize performance based on the specific requirements of your task and the available hardware resources.
block_swap_debug
This boolean parameter enables debug logging for block swapping performance when set to True. The default value is False. Enabling this option can be useful for diagnosing performance issues or understanding the behavior of the block swapping process, as it provides detailed logs of the operations being performed.
Hunyuan-Foley BlockSwap Settings Output Parameters:
block_swap_args
This output parameter is a dictionary containing the arguments used for block swapping. It encapsulates all the input parameters and their values, providing a comprehensive overview of the configuration used for the block swapping process. This output is essential for understanding the settings applied during execution and can be used for debugging or further analysis.
Hunyuan-Foley BlockSwap Settings Usage Tips:
- To optimize VRAM usage, start with the default
blocks_to_swapvalue of 30 and adjust based on your GPU's memory capacity and the complexity of your task. - Use the
use_non_blockingoption if you have sufficient RAM and want to potentially speed up the memory transfer process, but be cautious of increased RAM usage. - Experiment with the
prefetch_blocksparameter to find the optimal number of blocks to prefetch for your specific hardware setup, as this can significantly impact performance. - Enable
block_swap_debugif you encounter performance issues or need to understand the block swapping process in detail, as it provides valuable insights through debug logs.
Hunyuan-Foley BlockSwap Settings Common Errors and Solutions:
"BlockSwap enabled but blocks_to_swap is 0. Moving all blocks to GPU."
- Explanation: This message indicates that block swapping is enabled, but the
blocks_to_swapparameter is set to 0, resulting in all blocks being moved to the GPU. - Solution: Increase the
blocks_to_swapvalue to offload some blocks to the CPU, thereby reducing GPU memory usage.
"Insufficient RAM for non-blocking transfer."
- Explanation: This error occurs when there is not enough RAM available to perform non-blocking memory transfers.
- Solution: Disable the
use_non_blockingoption or free up RAM by closing other applications or processes that are consuming memory.
"Prefetch index out of range."
- Explanation: This error happens when the
prefetch_blocksparameter is set too high, causing the prefetch index to exceed the available number of blocks. - Solution: Reduce the
prefetch_blocksvalue to ensure it stays within the valid range of available blocks.
