TurboQuant Info:
TurboQuantInfo is a node designed to provide insights into the compression statistics achieved by the TurboQuant system, specifically focusing on the TQ3 KV Cache compression. This node is particularly useful for users who have applied the TurboQuantPatch to their models and wish to observe the resulting compression efficiency. By connecting this node after the TurboQuantPatch, you can view the cumulative compression ratio and byte savings that have been realized during the model's operation. The primary goal of TurboQuantInfo is to offer a clear and concise summary of the compression performance, helping you understand the benefits of using TurboQuant's advanced compression techniques, such as 3-bit Lloyd-Max quantization and Fast Walsh-Hadamard Transform decorrelation, which aim to significantly reduce VRAM usage while maintaining model performance.
TurboQuant Info Input Parameters:
model
The model parameter is a required input for the TurboQuantInfo node. It represents the model that has been patched with TurboQuantPatch and for which you want to observe the compression statistics. This parameter is crucial as it allows the node to access the specific model's data and calculate the relevant compression metrics. There are no specific minimum, maximum, or default values for this parameter, as it simply needs to be a model that has undergone the TurboQuantPatch process.
TurboQuant Info Output Parameters:
stats
The stats output parameter provides a detailed string containing the observed compression statistics for the patched model. This includes information such as the number of stores, original and compressed sizes in megabytes, the compression ratio, and the total byte savings. If the TurboQuant system is not yet active, it will provide an estimated compression ratio and details about the expected encoding and block size. This output is essential for understanding the effectiveness of the TurboQuant compression and for making informed decisions about model optimization.
TurboQuant Info Usage Tips:
- Connect the TurboQuantInfo node immediately after the TurboQuantPatch node to ensure accurate and up-to-date compression statistics are displayed.
- Regularly check the
statsoutput to monitor the efficiency of the compression and make adjustments to your model or workflow as needed to optimize performance.
TurboQuant Info Common Errors and Solutions:
"Status: Not yet active (no inference run)"
- Explanation: This message indicates that the TurboQuant system has not yet been activated because no inference run has been performed on the model.
- Solution: Ensure that the model has been properly patched with TurboQuantPatch and that an inference run has been executed to activate the compression system and generate statistics.
"Expected compression: ~4.5x"
- Explanation: This is not an error message but an estimated compression ratio provided when the TurboQuant system is not active.
- Solution: To obtain actual compression statistics, perform an inference run on the patched model to activate the TurboQuant system and update the stats with real data.
