RunComfy

FLUX.2 [klein] 4B & 9B | Ultra-Fast Flux Image Generator

Blazing-fast visual creation with unified editing control.

FLUX | A New Art Image Generation

A new image generation model developed by Black Forest Labs

MultiTalk | Photo to Talking Video

Millisecond lip sync + Wan2.1 = 15s ultra-detailed talking videos!

Image Bypass | Smart Image Detection Bypass Utility Workflow

Skip limits and process images faster with total creative control.

ComfyUI > Nodes > ComfyUI_QwenVL_PromptCaption

ComfyUI Extension: ComfyUI_QwenVL_PromptCaption

Repo Name

ComfyUI_QwenVL_PromptCaption

Author
WingeD123 (Account age: 1221 days) Nodes
View all nodes(10) Latest Updated
2026-03-23 Github Stars
0.04K

Github Ask WingeD123 Current Questions Past Questions

Table of Content

Description
ComfyUI_QwenVL_PromptCaption Introduction
How ComfyUI_QwenVL_PromptCaption Works
ComfyUI_QwenVL_PromptCaption Features
ComfyUI_QwenVL_PromptCaption Models
Troubleshooting ComfyUI_QwenVL_PromptCaption
Learn More about ComfyUI_QwenVL_PromptCaption
Related Nodes

How to Install ComfyUI_QwenVL_PromptCaption

Install this extension via the ComfyUI Manager by searching for ComfyUI_QwenVL_PromptCaption

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_QwenVL_PromptCaption in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI_QwenVL_PromptCaption Description

ComfyUI_QwenVL_PromptCaption utilizes Qwen 2.5/3 VL to perform prompt inversion and generate captions, enhancing text processing capabilities within the ComfyUI framework.

ComfyUI_QwenVL_PromptCaption Introduction

ComfyUI_QwenVL_PromptCaption is an extension designed to enhance your experience with ComfyUI by leveraging the capabilities of Qwen VL models. This extension focuses on prompt inversion and caption generation, which can be particularly useful for AI artists looking to generate descriptive text from images or videos. By using this tool, you can transform visual content into meaningful textual descriptions, making it easier to understand and interpret the visual data. This can be especially helpful in creative projects where you need to generate prompts or captions based on visual inputs.

How ComfyUI_QwenVL_PromptCaption Works

At its core, ComfyUI_QwenVL_PromptCaption uses advanced models to analyze images or videos and generate corresponding text descriptions. Think of it as a translator that converts visual language into written language. When you input an image or a video, the extension processes the visual data and identifies key elements, which it then describes in text form. This process is known as prompt inversion, where the visual content is inverted into a textual prompt. The extension can handle both individual files and batches, making it versatile for different project needs.

ComfyUI_QwenVL_PromptCaption Features

Qwen XX VL Caption: This feature allows you to perform prompt inversion on single images or videos, generating captions that describe the visual content.
Qwen XX VL Batch Caption: Ideal for handling multiple images at once, this feature processes a folder of images and generates captions for each, streamlining your workflow.
Ovis 2.5 Run: This feature enables the use of the Ovis 2.5 model, which can be used for specific captioning tasks.
ASID_Caption: Utilize the ASID Captioner model for generating audio-visual captions, expanding the scope of your projects. Each feature can be customized by adjusting node inputs, allowing you to tailor the output to your specific needs. For example, you can edit prompt templates to influence the style or focus of the generated captions.

ComfyUI_QwenVL_PromptCaption Models

The extension supports various models, each suited for different tasks:

Qwen 2.5 VL 7B: Suitable for systems with 6-8GB VRAM, offering a balance between performance and resource usage.
Qwen 3 VL 8B: Recommended for systems with 10-16GB VRAM, providing enhanced precision.
Qwen 3 VL 4B: Ideal for high-performance systems with 16GB+ VRAM, allowing full precision processing.
Ovis 2.5 Models: Available in different sizes, these models are designed for specific captioning tasks.
ASID Captioner Models: These models are tailored for generating captions that integrate audio and visual elements. Choosing the right model depends on your system's capabilities and the specific requirements of your project.

Troubleshooting ComfyUI_QwenVL_PromptCaption

If you encounter issues while using the extension, here are some common solutions:

Model Loading Issues: Ensure that the models are correctly placed in the text_encoders directory and that all necessary configuration files are included.
Performance Problems: Adjust the max_side parameter to optimize processing speed. Larger values may slow down the process.
VRAM Errors: Use the unload_other_models option to free up VRAM before loading new models, preventing loading failures. For further assistance, consider checking community forums or documentation for additional support.

Learn More about ComfyUI_QwenVL_PromptCaption

To deepen your understanding and make the most of ComfyUI_QwenVL_PromptCaption, explore the following resources:

Qwen 2.5 VL 7B Instruct on Hugging Face
Qwen 3 VL 8B Instruct on Hugging Face
Ovis 2.5 Models on Hugging Face
ASID Captioner Models on Hugging Face These resources provide detailed information about the models and their capabilities, helping you choose the best options for your projects.

ComfyUI_QwenVL_PromptCaption Related Nodes

ASID Captioner (Inverse Prompt)

Ovis2.5 Run

Qwen2.5 VL Batch Caption

Qwen2.5 VL Caption (Inverse Prompt)

Qwen3 VL Batch Caption

Qwen3 VL Caption (Inverse Prompt)

Qwen3.5 VL Batch Caption

Qwen3.5 VL Caption (Inverse Prompt)

String to BBOX

String to SAM3 Box

Table of Content

Description
ComfyUI_QwenVL_PromptCaption Introduction
How ComfyUI_QwenVL_PromptCaption Works
ComfyUI_QwenVL_PromptCaption Features
ComfyUI_QwenVL_PromptCaption Models
Troubleshooting ComfyUI_QwenVL_PromptCaption
Learn More about ComfyUI_QwenVL_PromptCaption
Related Nodes

Qwen Image Edit | Precise AI Photo Editing

Edit photos fast with style, relighting, and object control precision.

Qwen Image Edit 2509 | Multi-Image Editor

Turn 2–3 images into one seamless, edited masterpiece instantly.

Qwen Image 2512 | Precision AI Image Generator

Ultra-detailed art creation with next-level visual accuracy and control.

AnimateDiff + IPAdapter V1 | Image to Video

With IPAdapter, you can efficiently control the generation of animations using reference images.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Extension: ComfyUI_QwenVL_PromptCaption

ComfyUI_QwenVL_PromptCaption

How to Install ComfyUI_QwenVL_PromptCaption

ComfyUI_QwenVL_PromptCaption Description

ComfyUI_QwenVL_PromptCaption Introduction

How ComfyUI_QwenVL_PromptCaption Works

ComfyUI_QwenVL_PromptCaption Features

ComfyUI_QwenVL_PromptCaption Models

Troubleshooting ComfyUI_QwenVL_PromptCaption

Learn More about ComfyUI_QwenVL_PromptCaption

ComfyUI_QwenVL_PromptCaption Related Nodes