ComfyUI > Nodes > Qwen2.5-VL GGUF Nodes > 🖼️ Image/Video Analysis (Transformers)

ComfyUI Node: 🖼️ Image/Video Analysis (Transformers)

Class Name

MultiImageAnalysis

Category
🤖 GGUF-VLM/🖼️ Vision Models
Author
walke2019 (Account age: 2560days)
Extension
Qwen2.5-VL GGUF Nodes
Latest Updated
2025-12-17
Github Stars
0.03K

How to Install Qwen2.5-VL GGUF Nodes

Install this extension via the ComfyUI Manager by searching for Qwen2.5-VL GGUF Nodes
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter Qwen2.5-VL GGUF Nodes in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

🖼️ Image/Video Analysis (Transformers) Description

Compares and analyzes images or video with advanced vision models for detailed insights.

🖼️ Image/Video Analysis (Transformers):

The MultiImageAnalysis node is designed to facilitate the comparison and analysis of multiple images, or a combination of a video and up to three images, using advanced vision models. This node leverages the power of transformers to provide detailed insights and descriptions of visual content, making it an invaluable tool for AI artists who wish to explore and understand the nuances of their visual data. By integrating sophisticated image processing capabilities, this node allows you to input multiple visual elements and receive comprehensive analyses, which can be particularly useful for tasks such as content generation, style comparison, or thematic exploration. The node's ability to handle both static images and dynamic video frames enhances its versatility, offering a robust solution for diverse creative and analytical needs.

🖼️ Image/Video Analysis (Transformers) Input Parameters:

model_config

This parameter specifies the configuration of the transformer model to be used for analysis. It determines the model's architecture and capabilities, impacting the quality and type of analysis performed. The configuration should be chosen based on the specific requirements of your task, such as the complexity of the images or the desired level of detail in the analysis.

prompt

The prompt is a string input that guides the analysis process by providing context or specific instructions to the model. It can be used to focus the analysis on particular aspects of the images or to elicit certain types of descriptions. The default value is "Describe these images.", and it can be customized to suit your creative or analytical objectives.

max_tokens

This integer parameter sets the maximum number of tokens that the model can generate in its output. It controls the length of the analysis or description provided by the model. The default value is 512, with a minimum of 128 and a maximum of 256,000 tokens. Adjusting this parameter allows you to balance between concise and detailed outputs.

temperature

A float parameter that influences the randomness of the model's output. A higher temperature results in more diverse and creative responses, while a lower temperature yields more focused and deterministic outputs. The default value is 0.7, with a range from 0.0 to 2.0. This parameter is crucial for tailoring the creativity level of the analysis.

video

This optional parameter accepts a video frame sequence or a single image as input. It allows the node to analyze dynamic content, providing insights into temporal changes or motion within the video. The video input can be used alone or in conjunction with other image inputs for comprehensive analysis.

image_1

An optional parameter for the first image input. It serves as one of the visual elements to be analyzed, and its content will be compared or contrasted with other inputs if provided. This parameter is essential for multi-image analysis tasks.

image_2

Similar to image_1, this optional parameter allows you to input a second image for analysis. It provides additional visual data for the model to process, enabling more complex comparisons and insights.

image_3

This optional parameter is for the third image input, further expanding the node's capability to handle multiple images simultaneously. It allows for a richer analysis by incorporating more visual elements into the process.

system_prompt

An optional string parameter that provides additional instructions or context to the model. It can be used to refine the analysis or to specify particular aspects of the images that should be emphasized. The default is an empty string, and it can be customized to enhance the relevance of the output.

🖼️ Image/Video Analysis (Transformers) Output Parameters:

description

The output parameter is a string that contains the detailed analysis or description of the input images and/or video. This output provides insights into the visual content, highlighting key features, themes, or differences among the inputs. It is the primary result of the node's processing and serves as a valuable resource for understanding and interpreting the visual data.

🖼️ Image/Video Analysis (Transformers) Usage Tips:

  • To achieve more creative and varied analyses, consider increasing the temperature parameter. This can be particularly useful when exploring artistic interpretations or generating novel insights.
  • When working with multiple images, ensure that the prompt is clear and specific to guide the model's focus effectively. This can help in obtaining more relevant and targeted descriptions.
  • Utilize the system_prompt to provide additional context or instructions that can refine the analysis, especially when dealing with complex or abstract visual content.

🖼️ Image/Video Analysis (Transformers) Common Errors and Solutions:

⚠️ Model not loaded, loading now...

  • Explanation: This message indicates that the vision model required for analysis is not currently loaded into memory.
  • Solution: Ensure that the model configuration is correct and that the system has sufficient resources to load the model. If the problem persists, check for any issues with the model files or paths.

❌ Analysis failed: <error_message>

  • Explanation: This error occurs when the analysis process encounters an unexpected issue, which could be due to incorrect input formats or model configuration errors.
  • Solution: Verify that all input parameters are correctly specified and that the input images or video are in the expected format. Review the traceback for more detailed information on the error and address any specific issues mentioned.

🖼️ Image/Video Analysis (Transformers) Related Nodes

Go back to the extension to check out more related nodes.
Qwen2.5-VL GGUF Nodes
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

🖼️ Image/Video Analysis (Transformers)