ComfyUI > Nodes > ComfyUI_QwenVL_PromptCaption > Qwen2.5 VL Batch Caption

ComfyUI Node: Qwen2.5 VL Batch Caption

Class Name

Qwen25CaptionBatch

Category
image/caption
Author
WingeD123 (Account age: 1221days)
Extension
ComfyUI_QwenVL_PromptCaption
Latest Updated
2026-03-23
Github Stars
0.04K

How to Install ComfyUI_QwenVL_PromptCaption

Install this extension via the ComfyUI Manager by searching for ComfyUI_QwenVL_PromptCaption
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI_QwenVL_PromptCaption in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Qwen2.5 VL Batch Caption Description

Batch process image captions using Qwen2.5 VL model, supporting Chinese and English outputs.

Qwen2.5 VL Batch Caption:

Qwen25CaptionBatch is a node designed to facilitate batch processing of image captioning using the Qwen2.5 VL model. This node is particularly useful for generating descriptive text for a collection of images, leveraging advanced visual-linguistic capabilities. It automates the process of loading and managing the model, ensuring efficient memory usage and providing a streamlined workflow for captioning tasks. The node is capable of handling multiple images at once, making it ideal for projects that require large-scale image analysis and description generation. By utilizing this node, you can efficiently generate captions in either Chinese or English, depending on your preference, and optimize the process through various configuration options.

Qwen2.5 VL Batch Caption Input Parameters:

model_path

The model_path parameter specifies the location of the text encoder model files required for the captioning process. It is crucial for loading the appropriate model components necessary for generating captions. This parameter ensures that the node can access the correct model files, which are essential for accurate and efficient caption generation.

lang

The lang parameter determines the language in which the captions will be generated. You can choose between "中文" (Chinese) and "English," with the default being Chinese. This parameter allows you to tailor the output to your preferred language, making the node versatile for different linguistic contexts.

dtype

The dtype parameter specifies the data type for model processing, with options including "auto," "4bit," and "8bit." The default setting is "4bit," which is strongly recommended for optimal performance. This parameter affects the precision and memory usage of the model, allowing you to balance between computational efficiency and resource consumption.

keep_model_loaded

The keep_model_loaded parameter is a boolean option that determines whether the model should remain loaded in memory after processing. By default, it is set to False, meaning the model will be unloaded to free up resources. This parameter is useful for managing memory usage, especially when processing large batches of images.

max_side

The max_side parameter defines the maximum dimension (in pixels) for resizing images before processing. It has a default value of 532, with a minimum of 252 and a maximum of 2240, adjustable in steps of 28. This parameter ensures that images are resized to a manageable size, optimizing processing speed and memory usage while maintaining image quality.

image_path

The image_path parameter specifies the directory path where the images to be captioned are located. It is essential for the node to access and process the images, and the path must be valid and accessible for successful execution.

save_path

The save_path parameter is an optional string that defines where the generated captions will be saved. If left empty, the captions will be saved in the same directory as the images. This parameter provides flexibility in organizing and storing the output results.

instruction

The instruction parameter is an optional multiline string that allows you to provide specific instructions or context for the captioning process. This can be used to guide the model in generating more relevant or context-aware captions, enhancing the quality of the output.

Qwen2.5 VL Batch Caption Output Parameters:

summary

The summary output parameter provides the generated captions as a string. This output contains the descriptive text for the batch of images processed, encapsulating the visual content in a textual format. It is the primary result of the node's operation, offering a concise and informative summary of the images.

Qwen2.5 VL Batch Caption Usage Tips:

  • Ensure that the model_path is correctly set to the directory containing the necessary model files to avoid loading errors.
  • Use the lang parameter to switch between Chinese and English captions, depending on your project requirements.
  • Adjust the max_side parameter to optimize image processing speed and memory usage, especially when dealing with high-resolution images.
  • Consider setting keep_model_loaded to True if you plan to process multiple batches consecutively, as this can reduce loading times.

Qwen2.5 VL Batch Caption Common Errors and Solutions:

"0 image captioned, 共处理0张图片"

  • Explanation: This error occurs when the specified image_path is invalid or does not contain any images.
  • Solution: Verify that the image_path is correct and that the directory contains images in supported formats.

"Failed to load model, 模型加载失败"

  • Explanation: This error indicates that the model files could not be loaded from the specified model_path.
  • Solution: Ensure that the model_path is set to the correct directory containing the required model files and that the files are not corrupted.

Qwen2.5 VL Batch Caption Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI_QwenVL_PromptCaption
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Qwen2.5 VL Batch Caption