RunComfy

Wan 2.2 FLF2V | First-Last Frame Video Generation

Generate smooth videos from a start and end frame using Wan 2.2 FLF2V.

SAM 3 | Advanced Object Segmentation Tool

Next-gen segmentation tool for precise object masking and tracking.

OmniGen | Image-To-Image

OmniGen: Modify Images Based on Reference Images and Prompts

SDXL Turbo | Rapid Text to Image

Experience fast text-to-image synthesis with SDXL Turbo.

ComfyUI > Nodes > ComfyUI-INT8-Toolkit > INT8 Kernel Config

ComfyUI Node: INT8 Kernel Config

Class Name

INT8KernelConfigTuner

Category
loaders

Author
SparknightLLC (Account age: 683days) Extension
ComfyUI-INT8-Toolkit Latest Updated
2026-06-23 Github Stars
0.03K

Github Ask SparknightLLC Current Questions Past Questions

Table of Content

Description
INT8KernelConfigTuner:
INT8KernelConfigTuner Input Parameters:
INT8KernelConfigTuner Output Parameters:
INT8KernelConfigTuner Usage Tips:
INT8KernelConfigTuner Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-INT8-Toolkit

Install this extension via the ComfyUI Manager by searching for ComfyUI-INT8-Toolkit

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-INT8-Toolkit in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

INT8 Kernel Config Description

Specialized node optimizing INT8 model performance through Triton kernel settings configuration for efficient matrix multiplication operations, simplifying kernel configuration for peak efficiency.

INT8 Kernel Config:

The INT8KernelConfigTuner is a specialized node designed to optimize the performance of INT8 models by configuring Triton kernel settings. This node allows you to fine-tune the kernel configurations for INT8 matrix multiplication operations, which are crucial for efficient model execution. By providing the ability to manually set kernel parameters or run microbenchmarks to determine the best configuration, the INT8KernelConfigTuner ensures that your model operates at peak efficiency. This is particularly beneficial for AI artists who want to leverage INT8 models for faster inference without delving into the complexities of kernel optimization. The node's primary goal is to simplify the process of kernel configuration, making it accessible and effective for users who may not have a deep technical background.

INT8 Kernel Config Input Parameters:

model

This parameter represents the INT8 model whose Triton kernel settings need to be synchronized during sampling. It ensures that the kernel configurations are applied to the correct model, facilitating efficient execution.

run_microbench

This boolean parameter, with a default value of False, determines whether to benchmark candidate kernel settings and use the fastest result for the model. Running a microbenchmark can help identify the most efficient kernel configuration, optimizing model performance.

block_m

This integer parameter specifies the Triton BLOCK_M tile size for fixed INT8 matrix multiplication kernels. It ranges from 16 to 512, with a default value of 128. Adjusting this value can impact the performance of the kernel by changing the size of the matrix tiles processed in parallel.

block_n

Similar to block_m, this integer parameter defines the Triton BLOCK_N tile size, with the same range and default value. It affects how the matrix multiplication is partitioned, influencing execution speed and efficiency.

block_k

This parameter sets the Triton BLOCK_K reduction tile size for fixed INT8 matrix multiplication kernels. It ranges from 16 to 512, with a default value of 64. This value determines the size of the reduction tiles, impacting the kernel's computational efficiency.

group_size_m

This integer parameter specifies the Triton GROUP_SIZE_M launch grouping value, ranging from 1 to 64, with a default of 8. It controls the grouping of threads during kernel execution, affecting parallelism and performance.

num_warps

This parameter defines the number of Triton warps per program, ranging from 1 to 16, with a default value of 4. Warps are groups of threads that execute instructions in lockstep, and adjusting this value can optimize resource utilization.

num_stages

This integer parameter sets the number of Triton pipeline stages, ranging from 1 to 8, with a default of 4. It determines the depth of the pipeline, influencing latency and throughput of the kernel execution.

bench_m

This parameter specifies the M dimension used by the optional synthetic kernel microbenchmark, ranging from 64 to 16384, with a default of 2048. It defines the size of the matrix dimension for benchmarking purposes.

bench_k

This parameter sets the K dimension for the synthetic kernel microbenchmark, with the same range and default as bench_m. It is used to evaluate the kernel's performance under different matrix sizes.

bench_n

Similar to bench_m and bench_k, this parameter defines the N dimension for the microbenchmark, with a default value of 4096. It helps in assessing the kernel's efficiency across various matrix configurations.

bench_warmup

This integer parameter specifies the number of warmup iterations before timing each candidate kernel configuration, ranging from 1 to 20, with a default of 2. Warmup iterations help stabilize performance measurements.

bench_iterations

This parameter sets the number of timed iterations per candidate kernel configuration, ranging from 2 to 100, with a default of 6. It determines how many times each configuration is tested to ensure accurate benchmarking results.

bench_include_scalar

This boolean parameter, with a default value of False, indicates whether to include scalar-weight kernel candidates in the benchmark. It is typically left off for per-row INT8 models to focus on more relevant configurations.

INT8 Kernel Config Output Parameters:

MODEL

The output parameter is the MODEL, which represents the INT8 model with the applied Triton kernel configuration. This output ensures that the model is optimized with the selected or benchmarked kernel settings, ready for efficient execution.

INT8 Kernel Config Usage Tips:

To achieve optimal performance, consider enabling run_microbench to automatically benchmark and select the best kernel configuration for your model.
Adjust the block_m, block_n, and block_k parameters based on the specific dimensions of your model's matrices to enhance execution efficiency.

INT8 Kernel Config Common Errors and Solutions:

INT8 Kernel Config: Triton kernel module unavailable

Explanation: This error occurs when the Triton kernel module is not available or cannot be imported.
Solution: Ensure that the Triton library is correctly installed and accessible in your environment.

INT8 Kernel Config: microbench failed

Explanation: This error indicates that the microbenchmarking process encountered an issue and could not complete successfully.
Solution: Check the input parameters for the microbenchmark, such as bench_m, bench_k, and bench_n, to ensure they are within valid ranges and try running the benchmark again.

INT8 Kernel Config Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-INT8-Toolkit

Table of Content

Description
INT8KernelConfigTuner:
INT8KernelConfigTuner Input Parameters:
INT8KernelConfigTuner Output Parameters:
INT8KernelConfigTuner Usage Tips:
INT8KernelConfigTuner Common Errors and Solutions:
Related Nodes

Qwen Image Edit 2509 | Multi-Image Editor

Turn 2–3 images into one seamless, edited masterpiece instantly.

Flex.1 LoRA Inference | AI Toolkit ComfyUI

Run your AI Toolkit-trained Flex.1 LoRA in ComfyUI with training-matched defaults using a single RC custom node.

Consistent Face 3x3 Generator

Generate 3x3 consistent character faces using FLUX and Depth LoRA

Wan Alpha | Transparent Video Generator

Alpha magic: instant transparent background videos for VFX and design.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: INT8 Kernel Config

INT8KernelConfigTuner

How to Install ComfyUI-INT8-Toolkit

INT8 Kernel Config Description

INT8 Kernel Config:

INT8 Kernel Config Input Parameters:

model

run_microbench

block_m

block_n

block_k

group_size_m

num_warps

num_stages

bench_m

bench_k

bench_n

bench_warmup

bench_iterations

bench_include_scalar

INT8 Kernel Config Output Parameters:

MODEL

INT8 Kernel Config Usage Tips:

INT8 Kernel Config Common Errors and Solutions:

INT8 Kernel Config: Triton kernel module unavailable

INT8 Kernel Config: microbench failed

INT8 Kernel Config Related Nodes