RunComfy

Wan 2.2 FLF2V | First-Last Frame Video Generation

Generate smooth videos from a start and end frame using Wan 2.2 FLF2V.

VACE Wan2.1 | V2V

Transform videos with a reference style image using VACE Wan2.1.

Image Bypass | Smart Image Detection Bypass Utility Workflow

Skip limits and process images faster with total creative control.

Wan2.2 Fun Camera | Cinematic Motion from Images

Turn still images into lively cinematic shots with smooth camera moves.

ComfyUI > Nodes > ComfyUI_QwenVL_PromptCaption > Qwen3 VL Caption (Inverse Prompt)

ComfyUI Node: Qwen3 VL Caption (Inverse Prompt)

Class Name

Qwen3Caption

Category
image/caption

Author
WingeD123 (Account age: 1221days) Extension
ComfyUI_QwenVL_PromptCaption Latest Updated
2026-03-23 Github Stars
0.04K

Github Ask WingeD123 Current Questions Past Questions

Table of Content

Description
Qwen3Caption:
Qwen3Caption Input Parameters:
Qwen3Caption Output Parameters:
Qwen3Caption Usage Tips:
Qwen3Caption Common Errors and Solutions:
Related Nodes

How to Install ComfyUI_QwenVL_PromptCaption

Install this extension via the ComfyUI Manager by searching for ComfyUI_QwenVL_PromptCaption

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_QwenVL_PromptCaption in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Qwen3 VL Caption (Inverse Prompt) Description

Generates descriptive image captions using the Qwen3 visual language model for automation.

Qwen3 VL Caption (Inverse Prompt):

Qwen3Caption is a node designed to generate descriptive captions for images using advanced visual language models. It leverages the Qwen3 visual language model to interpret and describe the content of an image, providing users with a textual representation of the visual input. This node is particularly beneficial for AI artists and developers who need to automate the process of image captioning, enabling them to quickly generate descriptions that can be used for various applications such as content creation, accessibility, and image indexing. The node's primary goal is to simplify the task of image captioning by providing a robust and efficient method to convert visual data into meaningful text, making it an essential tool for enhancing the accessibility and usability of visual content.

Qwen3 VL Caption (Inverse Prompt) Input Parameters:

model_path

The model_path parameter specifies the location of the text encoder model files. It is crucial for loading the appropriate model that will be used to generate captions. This parameter ensures that the node uses the correct model configuration, which directly impacts the quality and accuracy of the generated captions.

dtype

The dtype parameter determines the data type used for model processing, with options including "auto", "4bit", and "8bit". The default setting is "4bit", which is recommended for optimal performance. This parameter affects the precision and memory usage of the model, with lower bit settings generally offering faster processing at the cost of some precision.

keep_model_loaded

The keep_model_loaded parameter is a boolean that indicates whether the model should remain loaded in memory after processing. The default value is False, meaning the model will be unloaded to free up resources. Keeping the model loaded can improve performance when processing multiple images consecutively, as it avoids the overhead of reloading the model each time.

lang

The lang parameter specifies the language in which the captions will be generated, with options including "中文" (Chinese) and "English". The default language is "中文". This parameter is essential for ensuring that the generated captions are in the desired language, catering to different user needs and preferences.

max_side

The max_side parameter defines the maximum dimension (in pixels) for resizing the input image, with a default value of 512 and a range from 256 to 2240. This parameter helps manage the image size for processing, ensuring that the model can handle the input efficiently without exceeding memory limits.

image_path

The image_path parameter is a string that specifies the file path to the image that needs to be captioned. This parameter is essential as it provides the node with the visual data required for generating captions.

save_path

The save_path parameter is an optional string that specifies where the generated caption should be saved. This allows users to store the output for later use or further processing.

instruction

The instruction parameter is an optional multiline string that can provide additional guidance or context for the caption generation process. This can be used to tailor the output to specific requirements or to influence the style and content of the generated captions.

Qwen3 VL Caption (Inverse Prompt) Output Parameters:

text

The text output parameter provides the generated caption as a string. This output is the primary result of the node's processing, offering a descriptive text that represents the content of the input image. The caption can be used for various purposes, such as enhancing accessibility, aiding in content creation, or serving as metadata for image indexing.

Qwen3 VL Caption (Inverse Prompt) Usage Tips:

To optimize performance, consider setting keep_model_loaded to True when processing multiple images in succession, as this will reduce the time spent loading and unloading the model.
Use the instruction parameter to provide specific guidance or context for the caption generation, which can help tailor the output to better meet your needs.
Ensure that the model_path is correctly set to the desired model files to maintain the quality and accuracy of the generated captions.

Qwen3 VL Caption (Inverse Prompt) Common Errors and Solutions:

"no image, 无图像"

Explanation: This error occurs when no image is provided to the node for captioning.
Solution: Ensure that the image_path parameter is correctly set to the path of a valid image file.

"Failed to load model, 模型加载失败"

Explanation: This error indicates that the model could not be loaded from the specified path, possibly due to an incorrect model_path or missing files.
Solution: Verify that the model_path is correct and that all necessary model files are present in the specified directory.

Qwen3 VL Caption (Inverse Prompt) Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI_QwenVL_PromptCaption

Table of Content

Description
Qwen3Caption:
Qwen3Caption Input Parameters:
Qwen3Caption Output Parameters:
Qwen3Caption Usage Tips:
Qwen3Caption Common Errors and Solutions:
Related Nodes

Hunyuan3D 2.1 | Image to 3D Model

Big jump from 2.0: Turn photos into incredible 3D models instantly.

Put It Here Kontext | Object Replacement

Put anything anywhere. Kontext makes it look real. Works perfectly.

Qwen Image LoRA Inference | AI Toolkit ComfyUI

Keep AI Toolkit-trained Qwen Image LoRA inference in ComfyUI preview-aligned using a single RCQwenImage custom node.

LivePortrait | Animate Portraits | Img2Vid

Animate portraits with facial expressions and motion using a single image and reference video.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: Qwen3 VL Caption (Inverse Prompt)

Qwen3Caption

How to Install ComfyUI_QwenVL_PromptCaption

Qwen3 VL Caption (Inverse Prompt) Description

Qwen3 VL Caption (Inverse Prompt):

Qwen3 VL Caption (Inverse Prompt) Input Parameters:

model_path

dtype

keep_model_loaded

lang

max_side

image_path

save_path

instruction

Qwen3 VL Caption (Inverse Prompt) Output Parameters:

text

Qwen3 VL Caption (Inverse Prompt) Usage Tips:

Qwen3 VL Caption (Inverse Prompt) Common Errors and Solutions:

"no image, 无图像"

"Failed to load model, 模型加载失败"

Qwen3 VL Caption (Inverse Prompt) Related Nodes