ComfyUI > Nodes > ComfyUI_QwenVL_PromptCaption > Qwen2.5 VL Caption (Inverse Prompt)

ComfyUI Node: Qwen2.5 VL Caption (Inverse Prompt)

Class Name

Qwen25Caption

Category
image/caption
Author
WingeD123 (Account age: 1221days)
Extension
ComfyUI_QwenVL_PromptCaption
Latest Updated
2026-03-23
Github Stars
0.04K

How to Install ComfyUI_QwenVL_PromptCaption

Install this extension via the ComfyUI Manager by searching for ComfyUI_QwenVL_PromptCaption
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_QwenVL_PromptCaption in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Qwen2.5 VL Caption (Inverse Prompt) Description

Generates descriptive image captions using the Qwen2.5 VL model for enhanced storytelling.

Qwen2.5 VL Caption (Inverse Prompt):

Qwen25Caption is a node designed to generate descriptive captions for images using advanced visual-language models. It leverages the Qwen2.5 VL model to interpret visual content and produce text descriptions, making it a powerful tool for AI artists who want to add narrative or context to their visual creations. This node is particularly beneficial for those looking to automate the process of captioning images, thereby saving time and enhancing creativity. By utilizing this node, you can transform images into stories or informative pieces, enriching the viewer's experience and understanding. The node is designed to be user-friendly, allowing you to input images and receive captions without needing deep technical knowledge of the underlying AI models.

Qwen2.5 VL Caption (Inverse Prompt) Input Parameters:

image

This parameter expects a tensor representation of the image you wish to caption. The image is processed by the model to generate a descriptive text. It is crucial for the image to be correctly formatted as a tensor to ensure accurate captioning.

model_path

This parameter specifies the path to the model directory. It is essential for locating the necessary model files required for processing the image. The correct path ensures that the model can be loaded successfully, impacting the accuracy and quality of the generated captions.

lang

This parameter allows you to select the language in which the caption will be generated. Options typically include languages like "中文" (Chinese) and "English". Choosing the appropriate language is important for ensuring that the caption is understandable to your intended audience.

dtype

This parameter determines the data type used for model processing, with options such as "auto", "4bit", and "8bit". The choice of data type can affect the performance and speed of the captioning process, with "4bit" often recommended for optimal balance between speed and accuracy.

max_side

This parameter sets the maximum dimension for the image, with a default value of 512 and a range from 256 to 2240. It ensures that the image is resized appropriately for processing, which can impact the quality and detail of the generated caption.

keep_model_loaded

This boolean parameter indicates whether the model should remain loaded in memory after processing. Keeping the model loaded can speed up subsequent captioning tasks but may consume more memory resources.

instruction

This optional parameter allows you to provide specific instructions or context for the captioning process. It can be used to guide the model in generating captions that align with particular themes or styles.

Qwen2.5 VL Caption (Inverse Prompt) Output Parameters:

text

The output parameter is a string that contains the generated caption for the input image. This text provides a descriptive narrative or context for the image, enhancing its interpretability and value. The quality and relevance of the caption depend on the input parameters and the model's capabilities.

Qwen2.5 VL Caption (Inverse Prompt) Usage Tips:

  • Ensure that your image is correctly formatted as a tensor to avoid processing errors and to receive accurate captions.
  • Select the appropriate language for your audience to ensure that the generated captions are understandable and relevant.
  • Consider keeping the model loaded if you plan to process multiple images in succession, as this can significantly reduce processing time.
  • Use the instruction parameter to guide the model in generating captions that fit specific themes or styles, enhancing the creative output.

Qwen2.5 VL Caption (Inverse Prompt) Common Errors and Solutions:

"no image, 无图像"

  • Explanation: This error occurs when no image is provided as input to the node.
  • Solution: Ensure that you have correctly inputted an image tensor into the node before attempting to generate a caption.

"Failed to load model, 模型加载失败"

  • Explanation: This error indicates that the model could not be loaded from the specified path, possibly due to an incorrect path or missing files.
  • Solution: Verify that the model path is correct and that all necessary model files are present in the specified directory.

Qwen2.5 VL Caption (Inverse Prompt) Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_QwenVL_PromptCaption
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Qwen2.5 VL Caption (Inverse Prompt)