Visit ComfyUI Online for ready-to-use ComfyUI environment
Versatile video analysis tool with frame-based and direct video modes, generating descriptive prompts and scene analyses for AI artists.
The VideoPromptNode is a versatile tool within the ComfyUI framework designed to analyze video sequences or files using advanced Qwen2.5-VL multimodal models. This node offers two primary modes of operation: frame-based mode, which processes pre-loaded frames from LoadVideo nodes, and direct video mode, which handles video files directly. Its capabilities include generating descriptive prompts for video content, conducting comprehensive scene analyses, breaking down key scenes, and creating negative prompts based on configured templates. By leveraging these features, the node provides AI artists with insightful and detailed textual descriptions of video content, enhancing their ability to understand and creatively engage with video data.
This parameter allows you to select a predefined profile that includes a system prompt and specific rules for video analysis. The default profile is "HyVideoAnalyzer
This integer parameter sets the maximum number of new tokens the model can generate during analysis. It ranges from 1 to 2048, with a default value of 512. Adjusting this value can impact the length and detail of the generated prompts.
This integer parameter determines the number of frames sampled from the video sequence for analysis. It ranges from 1 to 32, with a default of 16. Increasing the count can provide a more comprehensive analysis but may require more processing power.
A float parameter that controls the randomness of the generated text. It ranges from 0.0 to 2.0, with a default of 0.7. Higher values increase creativity but may reduce coherence, while lower values produce more deterministic outputs.
This parameter specifies the type of analysis to perform, with options including "Full sequence," "Key scenes," and "Single summary." The default is "Full sequence," which provides a detailed analysis of the entire video.
This parameter sets the language for the output, with options for "English" and "Chinese." The default is "English," allowing you to receive prompts in your preferred language.
An optional parameter that accepts input frames from a LoadVideo node. These frames are used in frame-based mode for analysis.
An optional parameter that allows you to select a video file from the input directory for direct video mode processing.
This float parameter sets the frames per second for video processing, ranging from 0.1 to 60.0, with a default of 8.0. Higher values sample more frames, potentially improving analysis quality.
An integer parameter that defines the maximum number of pixels for video processing, with a default of 512512. It ranges from 0 to 1280720, and setting it to 0 uses the default resolution.
This integer parameter specifies the number of frames to use in fallback mode if initial processing fails. It ranges from 1 to 16, with a default of 4. Lower values use less VRAM, which can be beneficial for resource-constrained environments.
An optional string parameter that allows you to provide a custom system prompt to override the selected profile. This can be useful for tailoring the analysis to specific requirements.
An optional string parameter that adds text before the generated prompt, allowing for customization of the output.
An optional string parameter that adds text after the generated prompt, providing additional customization options.
An integer parameter for setting a random seed for generation, with a default of -1 for random. This can be used to ensure reproducibility of results.
This parameter allows you to select a predefined negative prompt to use, which can help refine the analysis by excluding certain elements.
A parameter with options "Yes" or "No" that determines whether to offload the model from the GPU when not in use, saving VRAM. The default is "Yes."
This parameter sets the model precision, with options "float16," "bfloat16," and "float32." The default is "float16," balancing performance and resource usage.
The primary output of the node is a generated text prompt that captures the essence of the analyzed video content. This text can include descriptive prompts, scene analyses, and key scene breakdowns, providing valuable insights into the video.
frame_sample_count
to capture more frames from the video sequence.temperature
parameter to adjust the creativity of the generated text. A higher temperature can lead to more imaginative descriptions, while a lower temperature ensures more consistent outputs.custom_system_prompt
to tailor the system prompt to your needs.model_offload
to "Yes" to save VRAM when the model is not in use.<error_message>
precision
parameter to a lower setting, such as "float16," to reduce VRAM usage.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.