Save 4 hours! We auto-setup your workflow! Free!

Drop your workflow.json — we handle every dependency, custom node, and model. Just open the link and run.

Auto-Setup Workflow Json (Free) Now!
ComfyUI > Nodes > ComfyUI-INT8-Toolkit > Load Diffusion Model INT8 (W8A8)

ComfyUI Node: Load Diffusion Model INT8 (W8A8)

Class Name

OTUNetLoaderW8A8

Category
loaders
Author
SparknightLLC (Account age: 683days)
Extension
ComfyUI-INT8-Toolkit
Latest Updated
2026-06-23
Github Stars
0.03K

How to Install ComfyUI-INT8-Toolkit

Install this extension via the ComfyUI Manager by searching for ComfyUI-INT8-Toolkit
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-INT8-Toolkit in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Load Diffusion Model INT8 (W8A8) Description

Facilitates loading optimized INT8 diffusion models for faster inference with reduced resource consumption.

Load Diffusion Model INT8 (W8A8):

The OTUNetLoaderW8A8 node is designed to facilitate the loading of diffusion models that are optimized for INT8 tensorwise quantization, specifically using W8A8 (weights and activations in 8-bit precision). This node is part of the INT8 Toolkit, which aims to enhance the performance of AI models by reducing their computational complexity and memory footprint without significantly compromising accuracy. By leveraging fast torch._int_mm inference, this node allows you to efficiently load and utilize models that are pre-quantized or require on-the-fly quantization. The primary benefit of using this node is its ability to handle INT8 weights natively, which can lead to faster inference times and reduced resource consumption, making it particularly useful for deploying models in environments with limited computational resources.

Load Diffusion Model INT8 (W8A8) Input Parameters:

unet_name

The unet_name parameter specifies the name of the diffusion model you wish to load. It is crucial as it determines which model file will be accessed and loaded into the system. This parameter should match the filename of the model stored in the designated folder for diffusion models. There are no explicit minimum or maximum values, but it must correspond to a valid model name within your system's directory.

weight_dtype

The weight_dtype parameter defines the data type for the model's weights. Options include default, fp8_e4m3fn, fp8_e4m3fn_fast, and fp8_e5m2, each offering different levels of precision and optimization. Choosing fp8_e4m3fn_fast enables additional optimizations for faster processing. The choice of data type can impact the model's performance and accuracy, with lower precision types generally offering faster computation at the potential cost of reduced accuracy.

model_type

The model_type parameter indicates the specific type of model being loaded, which can affect the exclusion presets used during quantization. This parameter helps tailor the quantization process to the characteristics of the model, ensuring optimal performance and compatibility.

on_the_fly_quantization

The on_the_fly_quantization parameter is a boolean that determines whether quantization should be applied dynamically as the model is loaded. Enabling this option allows for real-time adjustments to the model's weights, which can be beneficial for models that have not been pre-quantized.

outlier_method

The outlier_method parameter specifies the technique used to handle outliers during quantization. This can affect the stability and accuracy of the quantized model, as different methods may be more effective depending on the model's characteristics.

small_batch_fallback

The small_batch_fallback parameter is a boolean that enables a fallback mechanism for small batch sizes during inference. This can help maintain performance and accuracy when processing smaller datasets, which might otherwise lead to suboptimal results.

runtime_backend

The runtime_backend parameter defines the backend used for INT8 operations. This can influence the efficiency and compatibility of the model's execution, as different backends may offer varying levels of support and optimization for INT8 computations.

prepack_int8_weights

The prepack_int8_weights parameter is a boolean that determines whether INT8 weights should be pre-packed for faster access during inference. Enabling this option can reduce the overhead associated with loading and processing weights, leading to improved performance.

Load Diffusion Model INT8 (W8A8) Output Parameters:

MODEL

The MODEL output parameter represents the loaded diffusion model, now optimized for INT8 tensorwise quantization. This output is crucial as it provides you with a model that is ready for efficient inference, leveraging the benefits of reduced precision to achieve faster processing times and lower resource usage. The model can be directly used in your AI applications, allowing you to take advantage of the performance enhancements offered by INT8 quantization.

Load Diffusion Model INT8 (W8A8) Usage Tips:

  • Ensure that the unet_name matches exactly with the model file name in your directory to avoid loading errors.
  • Experiment with different weight_dtype options to find the best balance between performance and accuracy for your specific use case.
  • Enable on_the_fly_quantization if your model has not been pre-quantized to dynamically optimize its weights during loading.
  • Consider using prepack_int8_weights for models that require frequent loading and unloading to reduce processing overhead.

Load Diffusion Model INT8 (W8A8) Common Errors and Solutions:

Model file not found

  • Explanation: The specified unet_name does not match any model file in the designated directory.
  • Solution: Verify that the unet_name is correct and corresponds to an existing model file in your system's diffusion models folder.

Unsupported weight dtype

  • Explanation: The chosen weight_dtype is not supported by the current configuration or backend.
  • Solution: Check the available options for weight_dtype and select a supported type. Ensure that your system's backend is compatible with the chosen data type.

Quantization failure

  • Explanation: The model could not be quantized on-the-fly due to incompatible settings or model characteristics.
  • Solution: Review the model_type and on_the_fly_quantization settings to ensure they are appropriate for your model. Adjust the outlier_method or small_batch_fallback parameters if necessary.

Load Diffusion Model INT8 (W8A8) Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-INT8-Toolkit
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Load Diffusion Model INT8 (W8A8)