ComfyUI > Nodes > ComfyUI-Qwen3.5 > Qwen 3.5 (GGUF)

ComfyUI Node: Qwen 3.5 (GGUF)

Class Name

Qwen35GGUF

Category
Qwen3.5
Author
DanielBartolic (Account age: 2370days)
Extension
ComfyUI-Qwen3.5
Latest Updated
2026-03-13
Github Stars
0.03K

How to Install ComfyUI-Qwen3.5

Install this extension via the ComfyUI Manager by searching for ComfyUI-Qwen3.5
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-Qwen3.5 in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Qwen 3.5 (GGUF) Description

Qwen35GGUF accelerates AI model inference up to 9x faster using CUDA-optimized llama.cpp.

Qwen 3.5 (GGUF):

Qwen35GGUF is a specialized node designed for fast inference using the llama.cpp framework, which significantly accelerates processing speeds compared to traditional methods. This node is particularly beneficial for AI artists and developers who require rapid processing of large models, as it offers up to nine times faster inference than the FP16 transformers on high-performance GPUs like the RTX PRO 6000. The node supports a range of models available from Hugging Face, providing flexibility and scalability for various AI applications. By leveraging CUDA-optimized llama.cpp, Qwen35GGUF ensures efficient computation, making it an essential tool for those looking to enhance their AI-driven projects with speed and precision.

Qwen 3.5 (GGUF) Input Parameters:

top_p

The top_p parameter is a float that sets the nucleus sampling threshold, which determines the cumulative probability for token selection during inference. This parameter helps in controlling the randomness of the output, with a default value of 0.8, a minimum of 0.0, and a maximum of 1.0. Adjusting top_p can impact the diversity of the generated content, where lower values result in more deterministic outputs.

top_k

The top_k parameter is an integer that specifies the number of highest probability tokens to consider during sampling. It influences the creativity and variability of the output, with a default value of 20, a minimum of 1, and a maximum of 100. A higher top_k allows for more diverse outputs by considering a larger set of potential tokens.

repeat_penalty

The repeat_penalty parameter is a float that applies a penalty to repeated tokens, helping to reduce redundancy in the generated text. It has a default value of 1.0, with a minimum of 0.5 and a maximum of 2.0. Adjusting this parameter can enhance the quality of the output by discouraging repetitive sequences.

n_gpu_layers

The n_gpu_layers parameter is an integer that determines the number of layers offloaded to the GPU for processing. It has a default value of 99, with a range from -1 to 200. Setting this parameter to -1 or 99 offloads all layers to the GPU, optimizing performance by leveraging GPU acceleration.

ctx_size

The ctx_size parameter is an integer that defines the context window size in tokens, which affects the amount of text the model can consider at once. It has a default value of 8192, with a minimum of 1024 and a maximum of 131072. A larger context size allows the model to generate more coherent and contextually aware outputs.

enable_thinking

The enable_thinking parameter is a boolean that, when enabled, outputs reasoning in the THINKING output. This feature can be useful for applications requiring transparency in decision-making processes, with a default setting of False.

Qwen 3.5 (GGUF) Output Parameters:

THINKING

The THINKING output provides reasoning or thought processes generated by the model when the enable_thinking parameter is activated. This output is valuable for understanding the model's decision-making and can be used to enhance interpretability in AI applications.

Qwen 3.5 (GGUF) Usage Tips:

  • To achieve faster inference, ensure that llama.cpp is built with CUDA and that the llama-mtmd-cli binary is accessible in your system's PATH or specified via the cli_path setting.
  • Experiment with top_p and top_k parameters to balance between creativity and coherence in your outputs, depending on the specific requirements of your project.
  • Utilize the repeat_penalty parameter to minimize repetitive text, which can improve the quality and readability of the generated content.

Qwen 3.5 (GGUF) Common Errors and Solutions:

Error: "llama-mtmd-cli not found"

  • Explanation: This error occurs when the llama-mtmd-cli binary is not found in the system's PATH or the specified cli_path.
  • Solution: Ensure that llama.cpp is correctly built with CUDA support and that the llama-mtmd-cli binary is either in your system's PATH or the cli_path is correctly set.

Error: "Model not found"

  • Explanation: This error indicates that the specified model is not available or incorrectly referenced.
  • Solution: Verify that the model name is correctly specified and that it is available in the Hugging Face repository. Ensure that the model path is correctly set in the configuration.

Qwen 3.5 (GGUF) Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-Qwen3.5
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Qwen 3.5 (GGUF)