RunComfy

FLUX Kontext LoRA | Style Transfer

Mix 13 art styles instantly or plug in custom LoRAs!

SDXL LoRA Inference | AI Toolkit ComfyUI

Run your AI Toolkit-trained SDXL LoRA in ComfyUI with training-matched defaults using a single RC custom node.

ReActor | Fast Face Swap

With ComfyUI ReActor, you can easily swap the faces of one or more characters in images or videos.

Hunyuan3D 2.1 | Image to 3D Model

Big jump from 2.0: Turn photos into incredible 3D models instantly.

ComfyUI > Nodes > ComfyUI-INT8-Toolkit > Load Diffusion Model INT8 (W8A8)

ComfyUI Node: Load Diffusion Model INT8 (W8A8)

Class Name

OTUNetLoaderW8A8

Category
loaders

Author
SparknightLLC (Account age: 683days) Extension
ComfyUI-INT8-Toolkit Latest Updated
2026-06-23 Github Stars
0.03K

Github Ask SparknightLLC Current Questions Past Questions

Table of Content

Description
OTUNetLoaderW8A8:
OTUNetLoaderW8A8 Input Parameters:
OTUNetLoaderW8A8 Output Parameters:
OTUNetLoaderW8A8 Usage Tips:
OTUNetLoaderW8A8 Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-INT8-Toolkit

Install this extension via the ComfyUI Manager by searching for ComfyUI-INT8-Toolkit

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-INT8-Toolkit in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Load Diffusion Model INT8 (W8A8) Description

Facilitates loading optimized INT8 diffusion models for faster inference with reduced resource consumption.

Load Diffusion Model INT8 (W8A8):

The OTUNetLoaderW8A8 node is designed to facilitate the loading of diffusion models that are optimized for INT8 tensorwise quantization, specifically using W8A8 (weights and activations in 8-bit precision). This node is part of the INT8 Toolkit, which aims to enhance the performance of AI models by reducing their computational complexity and memory footprint without significantly compromising accuracy. By leveraging fast torch._int_mm inference, this node allows you to efficiently load and utilize models that are pre-quantized or require on-the-fly quantization. The primary benefit of using this node is its ability to handle INT8 weights natively, which can lead to faster inference times and reduced resource consumption, making it particularly useful for deploying models in environments with limited computational resources.

Load Diffusion Model INT8 (W8A8) Input Parameters:

unet_name

The unet_name parameter specifies the name of the diffusion model you wish to load. It is crucial as it determines which model file will be accessed and loaded into the system. This parameter should match the filename of the model stored in the designated folder for diffusion models. There are no explicit minimum or maximum values, but it must correspond to a valid model name within your system's directory.

weight_dtype

The weight_dtype parameter defines the data type for the model's weights. Options include default, fp8_e4m3fn, fp8_e4m3fn_fast, and fp8_e5m2, each offering different levels of precision and optimization. Choosing fp8_e4m3fn_fast enables additional optimizations for faster processing. The choice of data type can impact the model's performance and accuracy, with lower precision types generally offering faster computation at the potential cost of reduced accuracy.

model_type

The model_type parameter indicates the specific type of model being loaded, which can affect the exclusion presets used during quantization. This parameter helps tailor the quantization process to the characteristics of the model, ensuring optimal performance and compatibility.

on_the_fly_quantization

The on_the_fly_quantization parameter is a boolean that determines whether quantization should be applied dynamically as the model is loaded. Enabling this option allows for real-time adjustments to the model's weights, which can be beneficial for models that have not been pre-quantized.

outlier_method

The outlier_method parameter specifies the technique used to handle outliers during quantization. This can affect the stability and accuracy of the quantized model, as different methods may be more effective depending on the model's characteristics.

small_batch_fallback

The small_batch_fallback parameter is a boolean that enables a fallback mechanism for small batch sizes during inference. This can help maintain performance and accuracy when processing smaller datasets, which might otherwise lead to suboptimal results.

runtime_backend

The runtime_backend parameter defines the backend used for INT8 operations. This can influence the efficiency and compatibility of the model's execution, as different backends may offer varying levels of support and optimization for INT8 computations.

prepack_int8_weights

The prepack_int8_weights parameter is a boolean that determines whether INT8 weights should be pre-packed for faster access during inference. Enabling this option can reduce the overhead associated with loading and processing weights, leading to improved performance.

Load Diffusion Model INT8 (W8A8) Output Parameters:

MODEL

The MODEL output parameter represents the loaded diffusion model, now optimized for INT8 tensorwise quantization. This output is crucial as it provides you with a model that is ready for efficient inference, leveraging the benefits of reduced precision to achieve faster processing times and lower resource usage. The model can be directly used in your AI applications, allowing you to take advantage of the performance enhancements offered by INT8 quantization.

Load Diffusion Model INT8 (W8A8) Usage Tips:

Ensure that the unet_name matches exactly with the model file name in your directory to avoid loading errors.
Experiment with different weight_dtype options to find the best balance between performance and accuracy for your specific use case.
Enable on_the_fly_quantization if your model has not been pre-quantized to dynamically optimize its weights during loading.
Consider using prepack_int8_weights for models that require frequent loading and unloading to reduce processing overhead.

Load Diffusion Model INT8 (W8A8) Common Errors and Solutions:

Model file not found

Explanation: The specified unet_name does not match any model file in the designated directory.
Solution: Verify that the unet_name is correct and corresponds to an existing model file in your system's diffusion models folder.

Unsupported weight dtype

Explanation: The chosen weight_dtype is not supported by the current configuration or backend.
Solution: Check the available options for weight_dtype and select a supported type. Ensure that your system's backend is compatible with the chosen data type.

Quantization failure

Explanation: The model could not be quantized on-the-fly due to incompatible settings or model characteristics.
Solution: Review the model_type and on_the_fly_quantization settings to ensure they are appropriate for your model. Adjust the outlier_method or small_batch_fallback parameters if necessary.

Load Diffusion Model INT8 (W8A8) Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-INT8-Toolkit

Table of Content

Description
OTUNetLoaderW8A8:
OTUNetLoaderW8A8 Input Parameters:
OTUNetLoaderW8A8 Output Parameters:
OTUNetLoaderW8A8 Usage Tips:
OTUNetLoaderW8A8 Common Errors and Solutions:
Related Nodes

FLUX.1 Dev LoRA Inference | AI Toolkit ComfyUI

Run your AI Toolkit-trained FLUX.1 Dev LoRA in ComfyUI with training-matched behavior using a single RCFluxDev custom node.

ComfyUI F5 TTS | Natural Voice Cloning Engine

Turn text into rich, expressive voices with natural tone control.

Reallusion AI Render | 3D to ComfyUI Workflows Collection

ComfyUI + Reallusion = Speed, Accessibility, and Ease for 3D visuals

Face Detailer | Fix Faces

Use Face Detailer first for facial restoration, followed by the 4x UltraSharp Model for superior upscaling.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: Load Diffusion Model INT8 (W8A8)

OTUNetLoaderW8A8

How to Install ComfyUI-INT8-Toolkit

Load Diffusion Model INT8 (W8A8) Description

Load Diffusion Model INT8 (W8A8):

Load Diffusion Model INT8 (W8A8) Input Parameters:

unet_name

weight_dtype

model_type

on_the_fly_quantization

outlier_method

small_batch_fallback

runtime_backend

prepack_int8_weights

Load Diffusion Model INT8 (W8A8) Output Parameters:

MODEL

Load Diffusion Model INT8 (W8A8) Usage Tips:

Load Diffusion Model INT8 (W8A8) Common Errors and Solutions:

Model file not found

Unsupported weight dtype

Quantization failure

Load Diffusion Model INT8 (W8A8) Related Nodes