RunComfy

Flux Kontext Pulid | Consistent Character Generation

Create consistent characters using FLUX Kontext with a single face reference image.

Video Character Replacement (MoCha) | Realistic Swap Tool

Swap video characters fast with realistic motion and lighting control.

Dance Video Transform | Scene Customization & Face Swap

Transform dance videos with scene editing, face-swapping, and motion preservation.

VACE Wan2.1 | V2V

Transform videos with a reference style image using VACE Wan2.1.

ComfyUI > Nodes > ComfyUI_XISER_Nodes > Qwen VL Inference

ComfyUI Node: Qwen VL Inference

Class Name

XIS_QwenVLInference

Category
XISER_Nodes/LLM

Author
grinlau18 (Account age: 944days) Extension
ComfyUI_XISER_Nodes Latest Updated
2026-03-20 Github Stars
0.03K

Github Ask grinlau18 Current Questions Past Questions

Table of Content

Description
XIS_QwenVLInference:
XIS_QwenVLInference Input Parameters:
XIS_QwenVLInference Output Parameters:
XIS_QwenVLInference Usage Tips:
XIS_QwenVLInference Common Errors and Solutions:
Related Nodes

How to Install ComfyUI_XISER_Nodes

Install this extension via the ComfyUI Manager by searching for ComfyUI_XISER_Nodes

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_XISER_Nodes in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Qwen VL Inference Description

Facilitates local inference with Qwen3-VL for image understanding and multimodal dialogue tasks.

Qwen VL Inference:

XIS_QwenVLInference is a powerful node designed to facilitate local inference using the Qwen3-VL vision-language model. This node is particularly beneficial for tasks involving image understanding and multimodal dialogue, such as image captioning, visual question answering, document understanding, and optical character recognition (OCR). It supports up to eight image inputs and offers comprehensive control over generation parameters, including temperature, top_p, and max_tokens, allowing for fine-tuning of the model's output. The node automatically scans model directories and supports the Qwen3-VL series models, making it easy to integrate into your workflow. Additionally, it features automatic GPU/CPU selection and precision control, with Flash Attention 2 acceleration to enhance performance. This node is ideal for AI artists looking to leverage advanced vision-language capabilities without requiring extensive technical knowledge.

Qwen VL Inference Input Parameters:

instruction

The instruction parameter is a string input that allows you to specify the task or question you want the model to address. For example, you might input "Describe this image" to generate a descriptive caption for an image. This parameter is crucial as it guides the model's inference process, directly impacting the output. The default value is "Describe this image," and it supports multiline input for more complex instructions.

device

The device parameter determines whether the model runs on a GPU or CPU. By default, it is set to "auto," allowing the node to automatically select the most suitable hardware based on availability and performance considerations. This parameter ensures optimal resource utilization and can significantly affect the speed and efficiency of the inference process.

dtype

The dtype parameter specifies the data type used during inference, with the default set to "auto." This allows the node to automatically choose the appropriate precision level, balancing performance and accuracy. Adjusting this parameter can be useful for optimizing the model's performance on different hardware configurations.

flash_attention_2

The flash_attention_2 parameter is a boolean that enables or disables Flash Attention 2 acceleration. When set to True, it can enhance the model's performance by speeding up the attention mechanism, which is particularly beneficial for large-scale inference tasks.

trust_remote_code

The trust_remote_code parameter is a boolean that determines whether to trust and execute remote code. By default, it is set to True, allowing the node to utilize remote resources and updates, which can be advantageous for accessing the latest model improvements and features.

temperature

The temperature parameter controls the randomness of the model's output. It accepts values between 0 and 2, with a default of 0.7. Lower values result in more deterministic outputs, while higher values increase variability, which can be useful for creative tasks requiring diverse outputs.

top_p

The top_p parameter, also known as nucleus sampling, limits the model's output to the most probable tokens whose cumulative probability is below a specified threshold. It ranges from 0 to 1, with a default of 0.8. This parameter helps in generating coherent and contextually relevant outputs by focusing on the most likely options.

max_tokens

The max_tokens parameter sets the maximum number of tokens the model can generate in response to an input. It ranges from 16 to 16384, with a default of 1024. This parameter is crucial for controlling the length of the output, ensuring it is concise or detailed as required by the task.

top_k

The top_k parameter limits the model's output to the top k most probable tokens. It has a default value of 20, which helps in maintaining the quality of the generated text by focusing on the most likely options.

repetition_penalty

The repetition_penalty parameter discourages the model from repeating the same phrases or words. It has a default value of 1.0, with higher values further reducing repetition, which is useful for generating more varied and interesting outputs.

presence_penalty

The presence_penalty parameter encourages the model to introduce new topics or concepts in its output. It has a default value of 1.5, which can be adjusted to control the novelty of the generated text, making it suitable for tasks requiring creative or exploratory outputs.

seed

The seed parameter sets the random seed for the model's inference process, with a default value of 42. This ensures reproducibility of results, allowing you to generate consistent outputs across different runs with the same input parameters.

enable_cache

The enable_cache parameter is a boolean that enables or disables caching of intermediate results. When set to True, it can improve the efficiency of repeated inference tasks by reusing previously computed results, reducing computation time.

Qwen VL Inference Output Parameters:

text

The text output parameter provides the generated text response from the model based on the input instruction and image payloads. This output is crucial as it represents the model's interpretation and understanding of the input, offering insights or answers to the specified task. The text can vary in length and content depending on the input parameters and the complexity of the task.

Qwen VL Inference Usage Tips:

To optimize performance for image captioning tasks, consider adjusting the temperature and top_p parameters to balance creativity and coherence in the generated descriptions.
For tasks requiring detailed analysis, such as document understanding, increase the max_tokens parameter to allow for more comprehensive outputs.
Utilize the device parameter to ensure the model runs on the most suitable hardware, especially when handling large-scale inference tasks that can benefit from GPU acceleration.

Qwen VL Inference Common Errors and Solutions:

Qwen3-VL local inference failed: `<error_message>`

Explanation: This error indicates that the local inference process encountered an issue, possibly due to incorrect input parameters or hardware limitations.
Solution: Verify that all input parameters are correctly set and that your hardware meets the requirements for running the model. Consider adjusting the device parameter to ensure compatibility.

Failed to extract output text from decoded_results: `<error_message>`

Explanation: This error occurs when the model's output format is unexpected or incompatible with the expected structure.
Solution: Check the input parameters and ensure they align with the model's capabilities. If the issue persists, consider adjusting the max_tokens or top_p parameters to influence the output structure.

Qwen VL Inference Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI_XISER_Nodes

Table of Content

Description
XIS_QwenVLInference:
XIS_QwenVLInference Input Parameters:
XIS_QwenVLInference Output Parameters:
XIS_QwenVLInference Usage Tips:
XIS_QwenVLInference Common Errors and Solutions:
Related Nodes

FLUX ControlNet Depth-V3 & Canny-V3

Achieve better control with FLUX-ControlNet-Depth & FLUX-ControlNet-Canny for FLUX.1 [dev].

HiDream E1.1 | AI Image Editing

Edit images with natural language using HiDream E1.1 model

Wan2.2 Animate | Photo to Realistic Motion Video

Turn images into lifelike, moving characters with natural body and face motion.

Z Image ControlNet | Precision Image Generator

Total control over image poses, edges, and depth layouts.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: Qwen VL Inference

XIS_QwenVLInference

How to Install ComfyUI_XISER_Nodes

Qwen VL Inference Description

Qwen VL Inference:

Qwen VL Inference Input Parameters:

instruction

device

dtype

flash_attention_2

trust_remote_code

temperature

top_p

max_tokens

top_k

repetition_penalty

presence_penalty

seed

enable_cache

Qwen VL Inference Output Parameters:

text

Qwen VL Inference Usage Tips:

Qwen VL Inference Common Errors and Solutions:

Qwen3-VL local inference failed: <error_message>

Failed to extract output text from decoded_results: <error_message>

Qwen VL Inference Related Nodes

Qwen3-VL local inference failed: `<error_message>`

Failed to extract output text from decoded_results: `<error_message>`