MimicMotion | Human Motion Video Generation

Generate high-quality human motion videos with MimicMotion, using a reference image and motion sequence.

Flux TTP Upscale | 4K Face Restore

Repair distorted faces and upscale images to 4K resolution.

LivePortrait | Animate Portraits | Img2Vid

Animate portraits with facial expressions and motion using a single image and reference video.

Audioreactive Dancers Evolved

Transform your subject with an audioreactive background made of intricate geometries.

ComfyUI > Nodes > ComfyUI_CaptionThis > Florence2 Describe Image 🐑

ComfyUI Node: Florence2 Describe Image 🐑

Class Name

Florence2DescribeImage|Mie

Category
🐑 Florence2Caption

Author
mie (Account age: 1888days) Extension
ComfyUI_CaptionThis Latest Updated
2025-04-22 Github Stars
0.05K

Github Ask mie Current Questions Past Questions

Table of Content

Description
Florence2DescribeImage|Mie:
Florence2DescribeImage|Mie Input Parameters:
Florence2DescribeImage|Mie Output Parameters:
Florence2DescribeImage|Mie Usage Tips:
Florence2DescribeImage|Mie Common Errors and Solutions:
Related Nodes

How to Install ComfyUI_CaptionThis

Install this extension via the ComfyUI Manager by searching for ComfyUI_CaptionThis

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_CaptionThis in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Florence2 Describe Image 🐑 Description

Generate descriptive image captions using advanced AI models for enhanced visual projects.

Florence2 Describe Image 🐑| Florence2 Describe Image 🐑:

Florence2DescribeImage| Florence2 Describe Image 🐑 is a powerful node designed to generate descriptive captions for images using advanced AI models. This node leverages the Florence2 model to analyze and interpret visual content, providing detailed and contextually relevant descriptions. It is particularly beneficial for AI artists and creators who wish to enhance their visual projects with meaningful textual annotations. By utilizing this node, you can transform images into rich narratives, making them more accessible and engaging. The node's primary function is to process an image through a sophisticated model, which then outputs a descriptive text based on the visual elements and the specified task. This capability is essential for applications in digital art, content creation, and any field where image understanding and description are valuable.

Florence2 Describe Image 🐑| Florence2 Describe Image 🐑 Input Parameters:

model

The model parameter specifies the Florence2 model to be used for generating image descriptions. It is crucial as it determines the underlying AI capabilities and the quality of the output. The model is pre-loaded and includes both the processor and the model itself, ensuring seamless integration and execution.

image

The image parameter is the input image that you want to describe. This parameter is essential as it provides the visual content that the model will analyze to generate a description. The image should be in a compatible format for processing.

task

The task parameter defines the specific type of description you want the model to generate. It influences the style and detail level of the output. The default task is "more_detailed_caption," but you can choose from a list of predefined tasks to suit your needs.

seed

The seed parameter is an integer used to initialize the random number generator, ensuring reproducibility of results. It allows you to obtain consistent outputs across different runs with the same input. The default value is 42, with a minimum of 1 and a maximum of 0xffffffffffffffff.

max_new_tokens

The max_new_tokens parameter sets the maximum number of tokens that the model can generate for the description. It controls the length of the output text, with a default value of 1024, a minimum of 1, and a maximum of 4096.

num_beams

The num_beams parameter determines the number of beams used in beam search, a technique for generating more accurate and diverse outputs. A higher number of beams can improve the quality of the description but may increase computation time. The default is 3, with a minimum of 1 and a maximum of 64.

do_sample

The do_sample parameter is a boolean that indicates whether to use sampling during text generation. When set to true, it allows for more varied and creative outputs. The default value is true.

keep_model_loaded

The keep_model_loaded parameter is a boolean that specifies whether to keep the model loaded in memory after execution. This can be useful for batch processing multiple images without reloading the model each time. The default value is true.

Florence2 Describe Image 🐑| Florence2 Describe Image 🐑 Output Parameters:

text

The text output parameter is the generated description of the input image. It provides a detailed and contextually relevant narrative based on the visual content and the specified task. This output is crucial for enhancing the understanding and accessibility of images in various applications.

Florence2 Describe Image 🐑| Florence2 Describe Image 🐑 Usage Tips:

To achieve consistent results, use the same seed value when processing similar images.
Experiment with different task options to find the most suitable description style for your project.
Adjust the max_new_tokens and num_beams parameters to balance between description length and quality.

Florence2 Describe Image 🐑| Florence2 Describe Image 🐑 Common Errors and Solutions:

Model not found

Explanation: This error occurs when the specified model is not available in the local directory.
Solution: Ensure that the model is correctly downloaded and available in the specified path. You may need to download it from the appropriate repository.

Image format not supported

Explanation: The input image is in a format that the processor cannot handle.
Solution: Convert the image to a supported format, such as JPEG or PNG, before processing.

Out of memory

Explanation: The model requires more memory than is available on the device.
Solution: Reduce the max_new_tokens or num_beams parameters, or consider using a device with more memory.

Florence2 Describe Image 🐑 Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI_CaptionThis

Table of Content

Description
Florence2DescribeImage|Mie:
Florence2DescribeImage|Mie Input Parameters:
Florence2DescribeImage|Mie Output Parameters:
Florence2DescribeImage|Mie Usage Tips:
Florence2DescribeImage|Mie Common Errors and Solutions:
Related Nodes

UNO | Consistent Subject & Object Generation

Create stable and consistent images from subject and object references.

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

IC-Light | Image Relighting

Edit backgrounds, enhance lighting, and regenerate new scenes easily.

Hallo2 | Lip-Sync Portrait Animation

Audio-driven lip-sync for portrait animation in 4K.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.