ComfyUI > Nodes > ComfyUI_XISER_Nodes > Qwen VL Inference

ComfyUI Node: Qwen VL Inference

Class Name

XIS_QwenVLInference

Category
XISER_Nodes/LLM
Author
grinlau18 (Account age: 944days)
Extension
ComfyUI_XISER_Nodes
Latest Updated
2026-03-20
Github Stars
0.03K

How to Install ComfyUI_XISER_Nodes

Install this extension via the ComfyUI Manager by searching for ComfyUI_XISER_Nodes
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_XISER_Nodes in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Qwen VL Inference Description

Facilitates local inference with Qwen3-VL for image understanding and multimodal dialogue tasks.

Qwen VL Inference:

XIS_QwenVLInference is a powerful node designed to facilitate local inference using the Qwen3-VL vision-language model. This node is particularly beneficial for tasks involving image understanding and multimodal dialogue, such as image captioning, visual question answering, document understanding, and optical character recognition (OCR). It supports up to eight image inputs and offers comprehensive control over generation parameters, including temperature, top_p, and max_tokens, allowing for fine-tuning of the model's output. The node automatically scans model directories and supports the Qwen3-VL series models, making it easy to integrate into your workflow. Additionally, it features automatic GPU/CPU selection and precision control, with Flash Attention 2 acceleration to enhance performance. This node is ideal for AI artists looking to leverage advanced vision-language capabilities without requiring extensive technical knowledge.

Qwen VL Inference Input Parameters:

instruction

The instruction parameter is a string input that allows you to specify the task or question you want the model to address. For example, you might input "Describe this image" to generate a descriptive caption for an image. This parameter is crucial as it guides the model's inference process, directly impacting the output. The default value is "Describe this image," and it supports multiline input for more complex instructions.

device

The device parameter determines whether the model runs on a GPU or CPU. By default, it is set to "auto," allowing the node to automatically select the most suitable hardware based on availability and performance considerations. This parameter ensures optimal resource utilization and can significantly affect the speed and efficiency of the inference process.

dtype

The dtype parameter specifies the data type used during inference, with the default set to "auto." This allows the node to automatically choose the appropriate precision level, balancing performance and accuracy. Adjusting this parameter can be useful for optimizing the model's performance on different hardware configurations.

flash_attention_2

The flash_attention_2 parameter is a boolean that enables or disables Flash Attention 2 acceleration. When set to True, it can enhance the model's performance by speeding up the attention mechanism, which is particularly beneficial for large-scale inference tasks.

trust_remote_code

The trust_remote_code parameter is a boolean that determines whether to trust and execute remote code. By default, it is set to True, allowing the node to utilize remote resources and updates, which can be advantageous for accessing the latest model improvements and features.

temperature

The temperature parameter controls the randomness of the model's output. It accepts values between 0 and 2, with a default of 0.7. Lower values result in more deterministic outputs, while higher values increase variability, which can be useful for creative tasks requiring diverse outputs.

top_p

The top_p parameter, also known as nucleus sampling, limits the model's output to the most probable tokens whose cumulative probability is below a specified threshold. It ranges from 0 to 1, with a default of 0.8. This parameter helps in generating coherent and contextually relevant outputs by focusing on the most likely options.

max_tokens

The max_tokens parameter sets the maximum number of tokens the model can generate in response to an input. It ranges from 16 to 16384, with a default of 1024. This parameter is crucial for controlling the length of the output, ensuring it is concise or detailed as required by the task.

top_k

The top_k parameter limits the model's output to the top k most probable tokens. It has a default value of 20, which helps in maintaining the quality of the generated text by focusing on the most likely options.

repetition_penalty

The repetition_penalty parameter discourages the model from repeating the same phrases or words. It has a default value of 1.0, with higher values further reducing repetition, which is useful for generating more varied and interesting outputs.

presence_penalty

The presence_penalty parameter encourages the model to introduce new topics or concepts in its output. It has a default value of 1.5, which can be adjusted to control the novelty of the generated text, making it suitable for tasks requiring creative or exploratory outputs.

seed

The seed parameter sets the random seed for the model's inference process, with a default value of 42. This ensures reproducibility of results, allowing you to generate consistent outputs across different runs with the same input parameters.

enable_cache

The enable_cache parameter is a boolean that enables or disables caching of intermediate results. When set to True, it can improve the efficiency of repeated inference tasks by reusing previously computed results, reducing computation time.

Qwen VL Inference Output Parameters:

text

The text output parameter provides the generated text response from the model based on the input instruction and image payloads. This output is crucial as it represents the model's interpretation and understanding of the input, offering insights or answers to the specified task. The text can vary in length and content depending on the input parameters and the complexity of the task.

Qwen VL Inference Usage Tips:

  • To optimize performance for image captioning tasks, consider adjusting the temperature and top_p parameters to balance creativity and coherence in the generated descriptions.
  • For tasks requiring detailed analysis, such as document understanding, increase the max_tokens parameter to allow for more comprehensive outputs.
  • Utilize the device parameter to ensure the model runs on the most suitable hardware, especially when handling large-scale inference tasks that can benefit from GPU acceleration.

Qwen VL Inference Common Errors and Solutions:

Qwen3-VL local inference failed: <error_message>

  • Explanation: This error indicates that the local inference process encountered an issue, possibly due to incorrect input parameters or hardware limitations.
  • Solution: Verify that all input parameters are correctly set and that your hardware meets the requirements for running the model. Consider adjusting the device parameter to ensure compatibility.

Failed to extract output text from decoded_results: <error_message>

  • Explanation: This error occurs when the model's output format is unexpected or incompatible with the expected structure.
  • Solution: Check the input parameters and ensure they align with the model's capabilities. If the issue persists, consider adjusting the max_tokens or top_p parameters to influence the output structure.

Qwen VL Inference Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_XISER_Nodes
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Qwen VL Inference