RunComfy

Wan 2.2 Animate | Character Swap & Lip-Sync

Transforms any face to speak and move like the original with ease.

Animatediff V2 & V3 | Text to Video

Explore AnimateDiff V3, AnimateDiff SDXL and AnimateDiff V2, and use Upscale for high-resolution results.

Consistent Character Creator 3.0 | Easy Consistency, Any Angle

Make characters stay the same, every angle, strong and perfect.

Z-Image Finetuned Models Collection | Multi-Style Generator

Create stunning, detailed images across multiple styles and moods easily.

ComfyUI > Nodes > TrentNodes > VidScribe MiniCPM Beta

ComfyUI Node: VidScribe MiniCPM Beta

Class Name

VidScribeMiniCPMBeta

Category
Trent/VLM

Author
TrentHunter82 (Account age: 0days) Extension
TrentNodes Latest Updated
2026-03-20 Github Stars
0.03K

Github Ask TrentHunter82 Current Questions Past Questions

Table of Content

Description
VidScribeMiniCPMBeta:
VidScribeMiniCPMBeta Input Parameters:
VidScribeMiniCPMBeta Output Parameters:
VidScribeMiniCPMBeta Usage Tips:
VidScribeMiniCPMBeta Common Errors and Solutions:
Related Nodes

How to Install TrentNodes

Install this extension via the ComfyUI Manager by searching for TrentNodes

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter TrentNodes in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

VidScribe MiniCPM Beta Description

VidScribeMiniCPMBeta: GPU-accelerated video/image descriptions using MiniCPM-V 4.5 with smart frame sampling.

VidScribe MiniCPM Beta:

VidScribeMiniCPMBeta is a sophisticated vision-language node designed for ComfyUI, aimed at providing GPU-accelerated descriptions of videos and images using the MiniCPM-V 4.5 model. This node leverages int4 quantization, which optimizes VRAM usage to approximately 6-8GB, making it efficient for high-performance tasks. One of its standout features is smart frame sampling, which intelligently selects frames for processing, enhancing both speed and accuracy. Additionally, the node is designed to automatically unload after being idle, ensuring optimal resource management. VidScribeMiniCPMBeta is particularly beneficial for users looking to generate detailed and contextually relevant descriptions of visual content, making it an invaluable tool for AI artists and developers working with multimedia data.

VidScribe MiniCPM Beta Input Parameters:

images

The images parameter represents the collection of video frames or images that you want to describe. This input is crucial as it forms the basis of the node's processing, where each frame is analyzed to generate descriptive text. The quality and relevance of the output are directly influenced by the images provided, so it's important to ensure that the input images are clear and representative of the content you wish to describe.

prompt

The prompt parameter is a textual input that guides the inference process. It acts as a starting point or context for the descriptions generated by the node. By providing a well-crafted prompt, you can influence the style and focus of the output, making it more aligned with your specific needs or artistic vision.

mode

The mode parameter determines the operational mode of the node, affecting how the descriptions are generated. Different modes may prioritize various aspects of the description, such as detail, creativity, or factual accuracy. Selecting the appropriate mode can significantly impact the quality and relevance of the output.

system_prompt

The system_prompt parameter allows you to specify a predefined system prompt from a set of choices. This can help standardize the output or align it with specific requirements or themes. By selecting a suitable system prompt, you can ensure consistency and coherence in the descriptions generated across different runs.

thinking_mode

The thinking_mode parameter influences the depth and complexity of the descriptions. It can be adjusted to balance between generating concise summaries or more elaborate and detailed descriptions, depending on your needs.

max_tokens

The max_tokens parameter sets a limit on the number of tokens (words or word pieces) in the generated description. This helps control the length of the output, ensuring it remains within desired bounds for readability or specific application requirements.

temperature

The temperature parameter controls the randomness of the output. A lower temperature results in more deterministic and focused descriptions, while a higher temperature introduces more variability and creativity. Adjusting this parameter allows you to fine-tune the balance between consistency and diversity in the generated text.

seed

The seed parameter is used to initialize the random number generator, ensuring reproducibility of results. By setting a specific seed, you can obtain consistent outputs across multiple runs, which is useful for testing and iterative development.

lock_output

The lock_output parameter is a boolean option that, when enabled, returns the last cached output without re-running inference. This is particularly useful for iterating on downstream nodes without the need to reload the model, saving time and computational resources.

VidScribe MiniCPM Beta Output Parameters:

response

The response output is a string containing the generated description of the input images or video frames. This text provides a detailed and contextually relevant narrative of the visual content, which can be used for various applications such as content creation, analysis, or enhancement.

images

The images output returns the processed images, which may include the sampled frames used for generating the description. This allows you to verify the frames that were analyzed and ensure they align with your expectations or requirements.

vram_cleared

The vram_cleared output is a string indicating the status of VRAM clearance after the inference process. This is important for resource management, as it confirms that the node has efficiently released GPU memory, allowing for optimal performance in subsequent tasks.

VidScribe MiniCPM Beta Usage Tips:

Utilize the prompt parameter to guide the style and focus of the descriptions, ensuring they align with your artistic or analytical goals.
Experiment with the temperature parameter to find the right balance between creativity and consistency in the generated descriptions.
Use the lock_output feature to save time and resources when iterating on downstream nodes, especially during the development and testing phases.

VidScribe MiniCPM Beta Common Errors and Solutions:

"Model not available"

Explanation: This error occurs when the MiniCPM model is not available or fails to load.
Solution: Ensure that the model is correctly downloaded and accessible. Check your internet connection and retry loading the model.

"Insufficient VRAM"

Explanation: This error indicates that there is not enough VRAM available to process the input images.
Solution: Reduce the number of input images or lower the resolution. Alternatively, ensure no other processes are consuming excessive GPU resources.

"Invalid input parameters"

Explanation: This error arises when one or more input parameters are incorrectly specified.
Solution: Double-check all input parameters for correctness, ensuring they meet the expected types and ranges.

VidScribe MiniCPM Beta Related Nodes

Go back to the extension to check out more related nodes.

TrentNodes

Table of Content

Description
VidScribeMiniCPMBeta:
VidScribeMiniCPMBeta Input Parameters:
VidScribeMiniCPMBeta Output Parameters:
VidScribeMiniCPMBeta Usage Tips:
VidScribeMiniCPMBeta Common Errors and Solutions:
Related Nodes

FLUX Kontext Face Swap | Seamless Face Replacement

Photoreal face replacement with prompt-guided control and natural blending

Qwen Image Edit Plus 2509 LoRA Inference | AI Toolkit ComfyUI

Apply AI Toolkit-trained Qwen Image Edit Plus 2509 LoRAs in ComfyUI with preview-aligned edits using a single RCQwenImageEditPlus custom node.

Wan 2.2 Video Restyle | First Frame Restyle for Consistent and Cinematic Video Generation

Change the first frame, folks, your style makes the whole video look amazing. Pure magic.

Face Detailer | Fix Faces

Use Face Detailer first for facial restoration, followed by the 4x UltraSharp Model for superior upscaling.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy