ComfyUI Node: Two-Round VLM Prompter

Class Name

TwoRoundVLMPrompter

Category
VLM/Advanced
Author
fblissjr (Account age: 4014days)
Extension
Shrug-Prompter: Unified VLM Integration for ComfyUI
Latest Updated
2025-09-30
Github Stars
0.02K

How to Install Shrug-Prompter: Unified VLM Integration for ComfyUI

Install this extension via the ComfyUI Manager by searching for Shrug-Prompter: Unified VLM Integration for ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter Shrug-Prompter: Unified VLM Integration for ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Two-Round VLM Prompter Description

Enhances video prompts by analyzing images with VLM and crafting cinematic prompts using Qwen2.5.

Two-Round VLM Prompter:

The TwoRoundVLMPrompter is a sophisticated node designed to enhance the process of generating detailed and contextually rich prompts for video generation models. It operates in two distinct rounds, each serving a unique purpose. In the first round, the node leverages a Vision Language Model (VLM) to meticulously analyze an image, capturing every visual detail, color, and composition element. This round is crucial for gathering comprehensive observational data. In the second round, the node utilizes the Qwen2.5 model to transform the detailed description from the first round into a cinematic prompt tailored for video generation. This transformation focuses on aspects such as movement, atmosphere, and visual style, making it ideal for creating dynamic and engaging video content. By employing specialized models for each task, the TwoRoundVLMPrompter ensures that the output is both precise and creatively inspiring, making it an invaluable tool for AI artists looking to generate high-quality video prompts.

Two-Round VLM Prompter Input Parameters:

round1_context

This parameter specifies the context for the first round of processing, where a Vision Language Model (VLM) analyzes the image. It is crucial for setting the environment in which the model operates, ensuring that the observations are accurate and relevant to the task at hand.

round1_system_prompt

This is a string parameter that provides the system prompt for the first round. It is designed to guide the VLM in its observational task, with a default prompt encouraging detailed and comprehensive descriptions of the image. The prompt is multiline and can be customized to suit specific needs.

round1_user_prompt

Similar to the system prompt, this string parameter allows the user to input a custom prompt for the first round. It defaults to a request for a detailed description of the image, including all visual elements and notable features. This prompt is also multiline, providing flexibility in how the task is framed.

round2_context

This parameter sets the context for the second round, where the Qwen2.5 model rewrites the description into a cinematic prompt. It ensures that the rewriting process is aligned with the intended use case, focusing on video generation.

round2_system_prompt

A string parameter that provides the system prompt for the second round. It defaults to a prompt that positions the model as an expert in prompt engineering for video generation, guiding the transformation of the description into a cinematic format.

round2_user_prompt

This string parameter allows the user to input a custom prompt for the second round. It defaults to a request for rewriting the description as a cinematic prompt, emphasizing movement, atmosphere, and visual style. The prompt is multiline, allowing for detailed instructions.

max_tokens

An integer parameter that defines the maximum number of tokens the model can generate in its output. It ranges from 1 to 32000, with a default value of 512. This parameter controls the length of the generated text, impacting the level of detail and complexity in the output.

temperature

A float parameter that influences the randomness of the model's output. It ranges from 0.0 to 2.0, with a default value of 0.7. A higher temperature results in more creative and diverse outputs, while a lower temperature produces more deterministic results.

top_p

This float parameter, ranging from 0.0 to 1.0 with a default of 0.9, determines the cumulative probability for token selection. It helps in controlling the diversity of the output by limiting the token pool to those with the highest probabilities, ensuring a balance between creativity and coherence.

Two-Round VLM Prompter Output Parameters:

context

This output parameter provides the updated context after both rounds of processing. It includes information about the models used in each round and the lengths of the observation and final prompt, offering insights into the processing workflow.

final_prompt

The final prompt is the result of the second round of processing, where the initial observation is transformed into a cinematic prompt suitable for video generation. It encapsulates the creative and stylistic elements necessary for dynamic video content.

round1_observation

This output contains the detailed description generated in the first round. It serves as the foundational observation that informs the subsequent rewriting process, capturing all relevant visual details of the image.

debug_info

The debug information provides insights into the processing steps, including model details and response lengths. It is particularly useful for troubleshooting and understanding the node's behavior during execution.

Two-Round VLM Prompter Usage Tips:

  • Customize the round1_user_prompt to focus on specific visual elements or themes you want to emphasize in the observation phase.
  • Adjust the temperature and top_p parameters to fine-tune the creativity and coherence of the final prompt, depending on whether you want a more exploratory or focused output.

Two-Round VLM Prompter Common Errors and Solutions:

Model Not Found

  • Explanation: This error occurs when the specified model for either round is not available or incorrectly specified in the context.
  • Solution: Ensure that the model names in round1_context and round2_context are correctly specified and that the models are available in your environment.

Invalid Token Range

  • Explanation: This error arises when the max_tokens parameter is set outside the allowed range.
  • Solution: Verify that the max_tokens value is between 1 and 32000 and adjust it accordingly.

Prompt Length Exceeded

  • Explanation: This error occurs when the generated prompt exceeds the maximum token limit.
  • Solution: Reduce the complexity of the prompts or increase the max_tokens parameter to accommodate longer outputs.

Two-Round VLM Prompter Related Nodes

Go back to the extension to check out more related nodes.
Shrug-Prompter: Unified VLM Integration for ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Two-Round VLM Prompter