Save 4 hours! We auto-setup your workflow! Free!

Drop your workflow.json — we handle every dependency, custom node, and model. Just open the link and run.

Auto-Setup Workflow Json (Free) Now!
ComfyUI > Nodes > ComfyUI-INT8-Toolkit > Enable INT8 on MODEL

ComfyUI Node: Enable INT8 on MODEL

Class Name

INT8ModelAdapter

Category
loaders
Author
SparknightLLC (Account age: 683days)
Extension
ComfyUI-INT8-Toolkit
Latest Updated
2026-06-23
Github Stars
0.03K

How to Install ComfyUI-INT8-Toolkit

Install this extension via the ComfyUI Manager by searching for ComfyUI-INT8-Toolkit
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-INT8-Toolkit in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Enable INT8 on MODEL Description

Specialized node optimizing AI model performance through INT8 quantization for diffusion processes, reducing computational load and memory usage.

Enable INT8 on MODEL:

The INT8ModelAdapter is a specialized node designed to enable and optimize the use of INT8 quantization for AI models, particularly those involving diffusion processes. This node is part of the ComfyUI-INT8-Toolkit and aims to enhance model performance by reducing the computational load and memory usage through quantization, which is the process of converting model weights from higher precision (such as FP32) to INT8. This conversion can significantly speed up inference times and reduce resource consumption, making it ideal for deployment in environments with limited computational power. The adapter intelligently manages the quantization process, ensuring that only suitable layers are converted while preserving the model's accuracy and functionality. It also provides mechanisms to handle runtime settings and caching, allowing for efficient reuse of quantized outputs. By leveraging this node, you can achieve faster model execution without compromising on the quality of the generated outputs.

Enable INT8 on MODEL Input Parameters:

model

The model parameter represents the AI model that you wish to apply INT8 quantization to. This parameter is crucial as it determines the specific model architecture and layers that will undergo the quantization process. The model should be compatible with the diffusion model structure for the adapter to function correctly.

enable_int8

The enable_int8 parameter is a boolean flag that determines whether INT8 quantization should be applied to the model. Setting this to True activates the quantization process, while False leaves the model unchanged. This parameter is essential for toggling the quantization feature on or off.

model_type

The model_type parameter specifies the type of model being used, which helps the adapter determine the appropriate quantization strategy and exclusions. It can be set to predefined types or auto for automatic detection. This parameter influences how the model's layers are selected for quantization.

outlier_method

The outlier_method parameter defines the strategy for handling outliers during the quantization process. Outliers can affect the accuracy of quantized models, so this parameter helps in choosing a method to mitigate their impact, ensuring the model remains robust post-quantization.

small_batch_fallback

The small_batch_fallback parameter is a boolean that determines whether to use a fallback mechanism for small batch sizes during quantization. This is important for maintaining performance and accuracy when processing smaller batches, which can be challenging for quantized models.

runtime_backend

The runtime_backend parameter specifies the backend to be used for executing the quantized model. Different backends may offer varying levels of performance and compatibility, so this parameter allows you to choose the most suitable one for your environment.

prepack_int8_weights

The prepack_int8_weights parameter is a boolean that indicates whether to prepack the INT8 weights for faster execution. Prepacking can improve runtime efficiency by optimizing how weights are stored and accessed during inference.

bake_loaded_loras

The bake_loaded_loras parameter is a boolean that determines whether to bake loaded LoRA (Low-Rank Adaptation) patches into the model before quantization. This can be important for ensuring that any modifications made by LoRA are preserved in the quantized model.

log_progress

The log_progress parameter is a boolean that controls whether progress and diagnostic information should be logged during the quantization process. Enabling this can be helpful for debugging and understanding the quantization steps and outcomes.

use_triton

The use_triton parameter is an optional boolean that specifies whether to use the Triton backend for executing the quantized model. Triton can offer performance benefits, and this parameter allows you to leverage those advantages if available.

Enable INT8 on MODEL Output Parameters:

model_patcher

The model_patcher output parameter represents the modified version of the input model after the INT8 quantization process has been applied. This output is crucial as it provides the quantized model ready for deployment, offering improved performance and reduced resource usage while maintaining the original model's functionality.

Enable INT8 on MODEL Usage Tips:

  • Ensure that your model is compatible with diffusion processes before applying the INT8ModelAdapter to avoid compatibility issues.
  • Use the log_progress parameter to monitor the quantization process and gain insights into how your model is being optimized.
  • Experiment with different runtime_backend options to find the most efficient execution environment for your quantized model.

Enable INT8 on MODEL Common Errors and Solutions:

INT8 Model Adapter: model has no diffusion_model; returning unchanged model.

  • Explanation: This error occurs when the input model does not have a diffusion model structure, which is required for the INT8ModelAdapter to function.
  • Solution: Ensure that the model you are using is compatible with diffusion processes or modify the model to include a diffusion model component.

INT8 Model Adapter: auto model_type could not identify this model; using conservative union exclusions.

  • Explanation: The adapter was unable to automatically determine the model type, leading to the use of a conservative approach for exclusions.
  • Solution: Manually specify the model_type parameter to ensure the adapter applies the most suitable quantization strategy for your model.

INT8 Model Adapter: This MODEL output is not INT8-converted; later dtype/runtime errors are likely outside the INT8 forward path.

  • Explanation: The model output was not successfully converted to INT8, which may lead to errors during execution.
  • Solution: Check the configuration settings, such as bake_loaded_loras and enable_int8, to ensure they are correctly set for successful quantization.

Enable INT8 on MODEL Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-INT8-Toolkit
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Enable INT8 on MODEL