ComfyUI Node: VLM Image Processor

Class Name

VLMImageProcessor

Category
VLM/Processing
Author
fblissjr (Account age: 4014days)
Extension
Shrug-Prompter: Unified VLM Integration for ComfyUI
Latest Updated
2025-09-30
Github Stars
0.02K

How to Install Shrug-Prompter: Unified VLM Integration for ComfyUI

Install this extension via the ComfyUI Manager by searching for Shrug-Prompter: Unified VLM Integration for ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter Shrug-Prompter: Unified VLM Integration for ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

VLM Image Processor Description

VLMImageProcessor optimizes images for VLM and video, managing memory for efficient batch processing.

VLM Image Processor:

The VLMImageProcessor is a versatile node designed to streamline image processing tasks by integrating multiple functionalities into a single, efficient implementation. Its primary purpose is to optimize images for Visual Language Model (VLM) processing and prepare them for video generation, all while managing memory usage effectively. This node is particularly beneficial for AI artists who need to process large batches of images without overwhelming system resources. By automatically managing memory and processing images one at a time, the VLMImageProcessor ensures that memory is freed immediately after use, enhancing performance and preventing bottlenecks. The node's ability to resize images and adjust their quality based on user-defined parameters makes it a powerful tool for preparing images for various applications, from VLM analysis to video production.

VLM Image Processor Input Parameters:

images

This parameter represents the input images that you want to process. It is expected to be in the form of a tensor, which is a multi-dimensional array commonly used in machine learning and image processing tasks. The images are processed one at a time to ensure efficient memory usage.

mode

The mode parameter determines the processing approach applied to the images. It offers three options: optimize_for_vlm, prepare_for_video, and both. The optimize_for_vlm mode focuses on resizing and optimizing images for VLM processing, while prepare_for_video ensures that image dimensions are suitable for video generation by making them divisible by 8. The both option allows you to perform both optimizations simultaneously. The default value is optimize_for_vlm.

size

This parameter specifies the target size for resizing images. It provides options such as 256, 384, 512, 768, 1024, and original, with 384 being the default. If a specific size is selected, the images will be resized to fit within the specified dimensions while maintaining their aspect ratio. Choosing original will keep the images at their current size.

quality

The quality parameter defines the JPEG quality level for the processed images. It offers three options: draft, balanced, and high, with balanced as the default. These options correspond to JPEG quality settings of 70, 85, and 95, respectively. Higher quality settings result in better image fidelity but larger file sizes, while lower settings reduce file size at the cost of image quality.

VLM Image Processor Output Parameters:

processed

The processed output contains the images that have been optimized according to the selected mode and parameters. These images are ready for further use in VLM processing or video generation, depending on the mode chosen. The processed images are returned as a tensor, maintaining the same data structure as the input.

original

The original output provides a view of the input images. In cases where resizing is not required, this output will be identical to the processed output. It serves as a reference to the original images, allowing you to compare them with the processed versions if needed.

count

The count output indicates the number of images processed. This integer value helps you keep track of the batch size and ensures that all images have been accounted for during processing.

VLM Image Processor Usage Tips:

  • To optimize images for VLM processing, select the optimize_for_vlm mode and choose an appropriate size and quality setting based on your needs. This will ensure that images are resized and compressed efficiently.
  • When preparing images for video generation, use the prepare_for_video mode to automatically adjust image dimensions to be divisible by 8, which is a common requirement for video models.
  • If you need both VLM optimization and video preparation, select the both mode to apply both processes in a single step, saving time and resources.

VLM Image Processor Common Errors and Solutions:

ValueError: VLM context required. Please connect a VLMProviderConfig node.

  • Explanation: This error occurs when the node is used without a proper VLM context, which is necessary for processing.
  • Solution: Ensure that a VLMProviderConfig node is connected to provide the required context for processing images.

MemoryError: Unable to allocate memory for image processing.

  • Explanation: This error indicates that the system does not have enough memory to process the images.
  • Solution: Try reducing the batch size of images or selecting a smaller size option to decrease memory usage. Additionally, ensure that other applications are not consuming excessive memory resources.

VLM Image Processor Related Nodes

Go back to the extension to check out more related nodes.
Shrug-Prompter: Unified VLM Integration for ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

VLM Image Processor