Two-Round VLM Prompter:
The TwoRoundVLMPrompter is a sophisticated node designed to enhance the process of generating detailed and contextually rich prompts for video generation models. It operates in two distinct rounds, each serving a unique purpose. In the first round, the node leverages a Vision Language Model (VLM) to meticulously analyze an image, capturing every visual detail, color, and composition element. This round is crucial for gathering comprehensive observational data. In the second round, the node utilizes the Qwen2.5 model to transform the detailed description from the first round into a cinematic prompt tailored for video generation. This transformation focuses on aspects such as movement, atmosphere, and visual style, making it ideal for creating dynamic and engaging video content. By employing specialized models for each task, the TwoRoundVLMPrompter ensures that the output is both precise and creatively inspiring, making it an invaluable tool for AI artists looking to generate high-quality video prompts.
Two-Round VLM Prompter Input Parameters:
round1_context
This parameter specifies the context for the first round of processing, where a Vision Language Model (VLM) analyzes the image. It is crucial for setting the environment in which the model operates, ensuring that the observations are accurate and relevant to the task at hand.
round1_system_prompt
This is a string parameter that provides the system prompt for the first round. It is designed to guide the VLM in its observational task, with a default prompt encouraging detailed and comprehensive descriptions of the image. The prompt is multiline and can be customized to suit specific needs.
round1_user_prompt
Similar to the system prompt, this string parameter allows the user to input a custom prompt for the first round. It defaults to a request for a detailed description of the image, including all visual elements and notable features. This prompt is also multiline, providing flexibility in how the task is framed.
round2_context
This parameter sets the context for the second round, where the Qwen2.5 model rewrites the description into a cinematic prompt. It ensures that the rewriting process is aligned with the intended use case, focusing on video generation.
round2_system_prompt
A string parameter that provides the system prompt for the second round. It defaults to a prompt that positions the model as an expert in prompt engineering for video generation, guiding the transformation of the description into a cinematic format.
round2_user_prompt
This string parameter allows the user to input a custom prompt for the second round. It defaults to a request for rewriting the description as a cinematic prompt, emphasizing movement, atmosphere, and visual style. The prompt is multiline, allowing for detailed instructions.
max_tokens
An integer parameter that defines the maximum number of tokens the model can generate in its output. It ranges from 1 to 32000, with a default value of 512. This parameter controls the length of the generated text, impacting the level of detail and complexity in the output.
temperature
A float parameter that influences the randomness of the model's output. It ranges from 0.0 to 2.0, with a default value of 0.7. A higher temperature results in more creative and diverse outputs, while a lower temperature produces more deterministic results.
top_p
This float parameter, ranging from 0.0 to 1.0 with a default of 0.9, determines the cumulative probability for token selection. It helps in controlling the diversity of the output by limiting the token pool to those with the highest probabilities, ensuring a balance between creativity and coherence.
Two-Round VLM Prompter Output Parameters:
context
This output parameter provides the updated context after both rounds of processing. It includes information about the models used in each round and the lengths of the observation and final prompt, offering insights into the processing workflow.
final_prompt
The final prompt is the result of the second round of processing, where the initial observation is transformed into a cinematic prompt suitable for video generation. It encapsulates the creative and stylistic elements necessary for dynamic video content.
round1_observation
This output contains the detailed description generated in the first round. It serves as the foundational observation that informs the subsequent rewriting process, capturing all relevant visual details of the image.
debug_info
The debug information provides insights into the processing steps, including model details and response lengths. It is particularly useful for troubleshooting and understanding the node's behavior during execution.
Two-Round VLM Prompter Usage Tips:
- Customize the
round1_user_promptto focus on specific visual elements or themes you want to emphasize in the observation phase. - Adjust the
temperatureandtop_pparameters to fine-tune the creativity and coherence of the final prompt, depending on whether you want a more exploratory or focused output.
Two-Round VLM Prompter Common Errors and Solutions:
Model Not Found
- Explanation: This error occurs when the specified model for either round is not available or incorrectly specified in the context.
- Solution: Ensure that the model names in
round1_contextandround2_contextare correctly specified and that the models are available in your environment.
Invalid Token Range
- Explanation: This error arises when the
max_tokensparameter is set outside the allowed range. - Solution: Verify that the
max_tokensvalue is between 1 and 32000 and adjust it accordingly.
Prompt Length Exceeded
- Explanation: This error occurs when the generated prompt exceeds the maximum token limit.
- Solution: Reduce the complexity of the prompts or increase the
max_tokensparameter to accommodate longer outputs.
