RunComfy

Flux Kontext Pulid | Consistent Character Generation

Create consistent characters using FLUX Kontext with a single face reference image.

FLUX Inpainting | Seamless Image Editing

Effortlessly fill, remove, and refine images, seamlessly integrating new content.

Wan 2.2 Low Vram | Kijai Wrapper

Low VRAM. No longer waiting. Kijai wrapper included.

Wan2.2 Fun Camera | Cinematic Motion from Images

Turn still images into lively cinematic shots with smooth camera moves.

ComfyUI > Nodes > Qwen2.5-VL GGUF Nodes > 🖼️ Image/Video Analysis (Transformers)

ComfyUI Node: 🖼️ Image/Video Analysis (Transformers)

Class Name

MultiImageAnalysis

Category
🤖 GGUF-VLM/🖼️ Vision Models

Author
walke2019 (Account age: 2560days) Extension
Qwen2.5-VL GGUF Nodes Latest Updated
2025-12-17 Github Stars
0.03K

Github Ask walke2019 Current Questions Past Questions

Table of Content

Description
MultiImageAnalysis:
MultiImageAnalysis Input Parameters:
MultiImageAnalysis Output Parameters:
MultiImageAnalysis Usage Tips:
MultiImageAnalysis Common Errors and Solutions:
Related Nodes

How to Install Qwen2.5-VL GGUF Nodes

Install this extension via the ComfyUI Manager by searching for Qwen2.5-VL GGUF Nodes

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter Qwen2.5-VL GGUF Nodes in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

🖼️ Image/Video Analysis (Transformers) Description

Compares and analyzes images or video with advanced vision models for detailed insights.

🖼️ Image/Video Analysis (Transformers):

The MultiImageAnalysis node is designed to facilitate the comparison and analysis of multiple images, or a combination of a video and up to three images, using advanced vision models. This node leverages the power of transformers to provide detailed insights and descriptions of visual content, making it an invaluable tool for AI artists who wish to explore and understand the nuances of their visual data. By integrating sophisticated image processing capabilities, this node allows you to input multiple visual elements and receive comprehensive analyses, which can be particularly useful for tasks such as content generation, style comparison, or thematic exploration. The node's ability to handle both static images and dynamic video frames enhances its versatility, offering a robust solution for diverse creative and analytical needs.

🖼️ Image/Video Analysis (Transformers) Input Parameters:

model_config

This parameter specifies the configuration of the transformer model to be used for analysis. It determines the model's architecture and capabilities, impacting the quality and type of analysis performed. The configuration should be chosen based on the specific requirements of your task, such as the complexity of the images or the desired level of detail in the analysis.

prompt

The prompt is a string input that guides the analysis process by providing context or specific instructions to the model. It can be used to focus the analysis on particular aspects of the images or to elicit certain types of descriptions. The default value is "Describe these images.", and it can be customized to suit your creative or analytical objectives.

max_tokens

This integer parameter sets the maximum number of tokens that the model can generate in its output. It controls the length of the analysis or description provided by the model. The default value is 512, with a minimum of 128 and a maximum of 256,000 tokens. Adjusting this parameter allows you to balance between concise and detailed outputs.

temperature

A float parameter that influences the randomness of the model's output. A higher temperature results in more diverse and creative responses, while a lower temperature yields more focused and deterministic outputs. The default value is 0.7, with a range from 0.0 to 2.0. This parameter is crucial for tailoring the creativity level of the analysis.

video

This optional parameter accepts a video frame sequence or a single image as input. It allows the node to analyze dynamic content, providing insights into temporal changes or motion within the video. The video input can be used alone or in conjunction with other image inputs for comprehensive analysis.

image_1

An optional parameter for the first image input. It serves as one of the visual elements to be analyzed, and its content will be compared or contrasted with other inputs if provided. This parameter is essential for multi-image analysis tasks.

image_2

Similar to image_1, this optional parameter allows you to input a second image for analysis. It provides additional visual data for the model to process, enabling more complex comparisons and insights.

image_3

This optional parameter is for the third image input, further expanding the node's capability to handle multiple images simultaneously. It allows for a richer analysis by incorporating more visual elements into the process.

system_prompt

An optional string parameter that provides additional instructions or context to the model. It can be used to refine the analysis or to specify particular aspects of the images that should be emphasized. The default is an empty string, and it can be customized to enhance the relevance of the output.

🖼️ Image/Video Analysis (Transformers) Output Parameters:

description

The output parameter is a string that contains the detailed analysis or description of the input images and/or video. This output provides insights into the visual content, highlighting key features, themes, or differences among the inputs. It is the primary result of the node's processing and serves as a valuable resource for understanding and interpreting the visual data.

🖼️ Image/Video Analysis (Transformers) Usage Tips:

To achieve more creative and varied analyses, consider increasing the temperature parameter. This can be particularly useful when exploring artistic interpretations or generating novel insights.
When working with multiple images, ensure that the prompt is clear and specific to guide the model's focus effectively. This can help in obtaining more relevant and targeted descriptions.
Utilize the system_prompt to provide additional context or instructions that can refine the analysis, especially when dealing with complex or abstract visual content.

🖼️ Image/Video Analysis (Transformers) Common Errors and Solutions:

⚠️ Model not loaded, loading now...

Explanation: This message indicates that the vision model required for analysis is not currently loaded into memory.
Solution: Ensure that the model configuration is correct and that the system has sufficient resources to load the model. If the problem persists, check for any issues with the model files or paths.

❌ Analysis failed: `<error_message>`

Explanation: This error occurs when the analysis process encounters an unexpected issue, which could be due to incorrect input formats or model configuration errors.
Solution: Verify that all input parameters are correctly specified and that the input images or video are in the expected format. Review the traceback for more detailed information on the error and address any specific issues mentioned.

🖼️ Image/Video Analysis (Transformers) Related Nodes

Go back to the extension to check out more related nodes.

Qwen2.5-VL GGUF Nodes

Table of Content

Description
MultiImageAnalysis:
MultiImageAnalysis Input Parameters:
MultiImageAnalysis Output Parameters:
MultiImageAnalysis Usage Tips:
MultiImageAnalysis Common Errors and Solutions:
Related Nodes

Consistent & Realistic Characters

Create consistent and realistic characters with precise control over facial features, poses, and compositions.

Stable Audio Open 1.0 | Text-to-Music Tool

Turns text prompts into cinematic music seamlessly and fast.

Stable Video Infinity 2.0 | Long-Form Video Generator

Create long, smooth, story-driven AI videos effortlessly.

Wan2.2 S2V | Sound to Video Generator

Turns your audio clip into lifelike, synced video from one image

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: 🖼️ Image/Video Analysis (Transformers)

MultiImageAnalysis

How to Install Qwen2.5-VL GGUF Nodes

🖼️ Image/Video Analysis (Transformers) Description

🖼️ Image/Video Analysis (Transformers):

🖼️ Image/Video Analysis (Transformers) Input Parameters:

model_config

prompt

max_tokens

temperature

video

image_1

image_2

image_3

system_prompt

🖼️ Image/Video Analysis (Transformers) Output Parameters:

description

🖼️ Image/Video Analysis (Transformers) Usage Tips:

🖼️ Image/Video Analysis (Transformers) Common Errors and Solutions:

⚠️ Model not loaded, loading now...

❌ Analysis failed: <error_message>

🖼️ Image/Video Analysis (Transformers) Related Nodes

❌ Analysis failed: `<error_message>`