Z-Image | Fast Photorealistic Base Model

Super-fast image maker with stunning clarity and total control.

Advance Wan 2.1 video generation with lightweight depth and tile LoRAs for improved structure and detail.

Qwen Image 2512 LoRA Inference | AI Toolkit ComfyUI

Use an AI Toolkit-trained LoRA with Qwen Image 2512 in ComfyUI via one RCQwenImage2512 node for preview-aligned generations.

Consistent Character Creator 3.0 | Easy Consistency, Any Angle

Make characters stay the same, every angle, strong and perfect.

ComfyUI > Nodes > ComfyUI_QwenVL_PromptCaption > Qwen2.5 VL Caption (Inverse Prompt)

ComfyUI Node: Qwen2.5 VL Caption (Inverse Prompt)

Class Name

Qwen25Caption

Category
image/caption

Author
WingeD123 (Account age: 1221days) Extension
ComfyUI_QwenVL_PromptCaption Latest Updated
2026-03-23 Github Stars
0.04K

Github Ask WingeD123 Current Questions Past Questions

Table of Content

Description
Qwen25Caption:
Qwen25Caption Input Parameters:
Qwen25Caption Output Parameters:
Qwen25Caption Usage Tips:
Qwen25Caption Common Errors and Solutions:
Related Nodes

How to Install ComfyUI_QwenVL_PromptCaption

Install this extension via the ComfyUI Manager by searching for ComfyUI_QwenVL_PromptCaption

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_QwenVL_PromptCaption in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Qwen2.5 VL Caption (Inverse Prompt) Description

Generates descriptive image captions using the Qwen2.5 VL model for enhanced storytelling.

Qwen2.5 VL Caption (Inverse Prompt):

Qwen25Caption is a node designed to generate descriptive captions for images using advanced visual-language models. It leverages the Qwen2.5 VL model to interpret visual content and produce text descriptions, making it a powerful tool for AI artists who want to add narrative or context to their visual creations. This node is particularly beneficial for those looking to automate the process of captioning images, thereby saving time and enhancing creativity. By utilizing this node, you can transform images into stories or informative pieces, enriching the viewer's experience and understanding. The node is designed to be user-friendly, allowing you to input images and receive captions without needing deep technical knowledge of the underlying AI models.

Qwen2.5 VL Caption (Inverse Prompt) Input Parameters:

image

This parameter expects a tensor representation of the image you wish to caption. The image is processed by the model to generate a descriptive text. It is crucial for the image to be correctly formatted as a tensor to ensure accurate captioning.

model_path

This parameter specifies the path to the model directory. It is essential for locating the necessary model files required for processing the image. The correct path ensures that the model can be loaded successfully, impacting the accuracy and quality of the generated captions.

lang

This parameter allows you to select the language in which the caption will be generated. Options typically include languages like "中文" (Chinese) and "English". Choosing the appropriate language is important for ensuring that the caption is understandable to your intended audience.

dtype

This parameter determines the data type used for model processing, with options such as "auto", "4bit", and "8bit". The choice of data type can affect the performance and speed of the captioning process, with "4bit" often recommended for optimal balance between speed and accuracy.

max_side

This parameter sets the maximum dimension for the image, with a default value of 512 and a range from 256 to 2240. It ensures that the image is resized appropriately for processing, which can impact the quality and detail of the generated caption.

keep_model_loaded

This boolean parameter indicates whether the model should remain loaded in memory after processing. Keeping the model loaded can speed up subsequent captioning tasks but may consume more memory resources.

instruction

This optional parameter allows you to provide specific instructions or context for the captioning process. It can be used to guide the model in generating captions that align with particular themes or styles.

Qwen2.5 VL Caption (Inverse Prompt) Output Parameters:

text

The output parameter is a string that contains the generated caption for the input image. This text provides a descriptive narrative or context for the image, enhancing its interpretability and value. The quality and relevance of the caption depend on the input parameters and the model's capabilities.

Qwen2.5 VL Caption (Inverse Prompt) Usage Tips:

Ensure that your image is correctly formatted as a tensor to avoid processing errors and to receive accurate captions.
Select the appropriate language for your audience to ensure that the generated captions are understandable and relevant.
Consider keeping the model loaded if you plan to process multiple images in succession, as this can significantly reduce processing time.
Use the instruction parameter to guide the model in generating captions that fit specific themes or styles, enhancing the creative output.

Qwen2.5 VL Caption (Inverse Prompt) Common Errors and Solutions:

"no image, 无图像"

Explanation: This error occurs when no image is provided as input to the node.
Solution: Ensure that you have correctly inputted an image tensor into the node before attempting to generate a caption.

"Failed to load model, 模型加载失败"

Explanation: This error indicates that the model could not be loaded from the specified path, possibly due to an incorrect path or missing files.
Solution: Verify that the model path is correct and that all necessary model files are present in the specified directory.

Qwen2.5 VL Caption (Inverse Prompt) Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI_QwenVL_PromptCaption

Table of Content

Description
Qwen25Caption:
Qwen25Caption Input Parameters:
Qwen25Caption Output Parameters:
Qwen25Caption Usage Tips:
Qwen25Caption Common Errors and Solutions:
Related Nodes

ToonCrafter | Generative Cartoon Interpolation

ToonCrafter can generate cartoon interpolations between two cartoon images.

Omni Kontext | Seamless Scene Integration

Perfect scene fits. Unique style. Identity stays. Kontext keeps it real.

LTX-2 First Last Frame | Key Frames Video Generator

Turn still frames into seamless video and sound transitions fast.

FLUX ControlNet Depth-V3 & Canny-V3

Achieve better control with FLUX-ControlNet-Depth & FLUX-ControlNet-Canny for FLUX.1 [dev].

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.