ComfyUI  >  Nodes  >  ComfyUI ExLlamaV2 Nodes

ComfyUI Extension: ComfyUI ExLlamaV2 Nodes

Repo Name


Zuellni (Account age: 531 days)
View all nodes (4)
Latest Updated
Github Stars

How to Install ComfyUI ExLlamaV2 Nodes

Install this extension via the ComfyUI Manager by searching for  ComfyUI ExLlamaV2 Nodes
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI ExLlamaV2 Nodes in the search bar
After installation, click the  Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI ExLlamaV2 Nodes Description

ComfyUI ExLlamaV2 Nodes is a local text generator for ComfyUI, leveraging the ExLlamaV2 model. It requires manual package installation and provides efficient text generation capabilities within the ComfyUI framework.

ComfyUI ExLlamaV2 Nodes Introduction

ComfyUI-ExLlama-Nodes is an extension designed to enhance the capabilities of by integrating it with , a powerful local text generation library. This extension allows AI artists to generate high-quality text locally on their machines, leveraging the advanced features of ExLlamaV2. Whether you're creating stories, dialogues, or any other text-based content, ComfyUI-ExLlama-Nodes provides a seamless and efficient way to produce text with minimal setup.

How ComfyUI ExLlamaV2 Nodes Works

At its core, ComfyUI-ExLlama-Nodes works by connecting ComfyUI with ExLlamaV2, enabling local text generation on modern consumer GPUs. ExLlamaV2 is an inference library that supports various models and quantization techniques, making it versatile and efficient. The extension provides nodes that load models, generate text based on prompts, and display the generated text within the ComfyUI interface.

Basic Principles

  1. Model Loading: The extension loads pre-trained language models from a specified directory. These models can be in different quantization formats, such as 4-bit GPTQ or unquantized.
  2. Text Generation: Using the loaded models, the extension generates text based on user-provided prompts. The generation process can be customized with various parameters to control the output.
  3. Display and Interaction: The generated text is displayed within the ComfyUI interface, allowing users to interact with and refine the output as needed.

ComfyUI ExLlamaV2 Nodes Features

Loader Node

The Loader node is responsible for loading models from the llm directory. It offers several customization options:

  • cache_bits: Determines the number of bits used for caching. Lower values reduce VRAM usage but may affect generation speed and quality.
  • fast_tensors: When enabled, this option reduces RAM usage and speeds up model loading.
  • flash_attention: Reduces VRAM usage by enabling FlashAttention, which is not supported on GPUs with compute capability below 8.0.
  • max_seq_len: Sets the maximum context length. Higher values increase VRAM usage. A value of 0 defaults to the model's configuration.

Generator Node

The Generator node generates text based on a given prompt. Key parameters include:

  • unload: Unloads the model after each generation to reduce VRAM usage.
  • stop_conditions: A list of strings that, when encountered, stop the text generation. For example, ["\n"] stops generation on a newline.
  • max_tokens: Sets the maximum number of new tokens to generate. A value of 0 uses the available context.

Previewer Node

The Previewer node displays the generated text within the ComfyUI interface, allowing users to review and interact with the output.

Replacer Node

The Replacer node replaces variable names in brackets (e.g., [a]) with their corresponding values, making it easier to manage dynamic content within the generated text.

ComfyUI ExLlamaV2 Nodes Models

ComfyUI-ExLlama-Nodes supports various models, including EXL2, 4-bit GPTQ, and unquantized models. These models can be found on . Here are some examples:

  • Llama-3-8B-Instruct: A 6-bit model suitable for instructional text generation.
  • Llama2 70B: A large model that can run on a single 24 GB GPU with a 2048-token context, producing coherent and stable output. To use a model, you can clone its repository or manually download the files and place them in the models/llm directory.

What's New with ComfyUI ExLlamaV2 Nodes

Version 0.1.0+

  • Paged Attention Support: Integration with FlashAttention 2.5.7+ for improved performance.
  • Dynamic Generator: A new generator with dynamic batching, smart prompt caching, and K/V cache deduplication. These updates enhance the efficiency and flexibility of text generation, making it easier for AI artists to produce high-quality content.

Troubleshooting ComfyUI ExLlamaV2 Nodes

Common Issues and Solutions

  1. Model Loading Errors:
  • Ensure that the model files are correctly placed in the models/llm directory.
  • Verify that the model format is supported (EXL2, 4-bit GPTQ, or unquantized).
  1. High VRAM Usage:
  • Lower the cache_bits value in the Loader node settings.
  • Enable flash_attention if your GPU supports it.
  1. Slow Text Generation:
  • Enable fast_tensors in the Loader node settings.
  • Reduce the max_seq_len value to decrease the context length.

Frequently Asked Questions

  • Can I use my own models? Yes, you can add your own models by placing them in the models/llm directory and updating the extra_model_paths.yaml file.

  • What GPUs are supported? The extension supports modern consumer GPUs with compute capability 8.0 or higher for FlashAttention.

Learn More about ComfyUI ExLlamaV2 Nodes

For additional resources, tutorials, and community support, consider the following:

  • These resources provide comprehensive information and support to help you get the most out of ComfyUI-ExLlama-Nodes.

ComfyUI ExLlamaV2 Nodes Related Nodes


© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.