ComfyUI > Nodes > ComfyUI-decadetw-auto-prompt-llm > ✨ Auto-LLM-Vision

ComfyUI Node: ✨ Auto-LLM-Vision

Class Name

Auto-LLM-Vision

Category
🧩 Auto-Prompt-LLM
Author
xlinx (Account age: 4822days)
Extension
ComfyUI-decadetw-auto-prompt-llm
Latest Updated
2025-02-01
Github Stars
0.02K

How to Install ComfyUI-decadetw-auto-prompt-llm

Install this extension via the ComfyUI Manager by searching for ComfyUI-decadetw-auto-prompt-llm
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-decadetw-auto-prompt-llm in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

✨ Auto-LLM-Vision Description

Enhances AI image processing with language models for prompt generation, bridging visual and textual elements in AI art.

✨ Auto-LLM-Vision:

Auto-LLM-Vision is a powerful node designed to enhance the capabilities of AI-driven image processing within the ComfyUI framework. This node leverages advanced language models to interpret and generate prompts based on visual inputs, allowing for a seamless integration of text and vision-based AI functionalities. By utilizing this node, you can automate the generation of detailed and contextually relevant prompts from images, which can significantly enhance the creative process in AI art generation. The main goal of Auto-LLM-Vision is to bridge the gap between visual data and language models, providing a robust tool for artists to explore new dimensions of creativity by combining visual and textual elements in their work.

✨ Auto-LLM-Vision Input Parameters:

image_to_llm_vision

This parameter accepts an image input that serves as the basis for generating prompts using the language model. The image is processed to extract relevant features that can be translated into text prompts, enabling a deeper understanding and interpretation of the visual content.

llm_vision_max_token

This integer parameter defines the maximum number of tokens that the language model can generate for a given image input. It controls the length of the generated text, with a default value of 50, a minimum of 10, and a maximum of 1024. Adjusting this parameter can impact the detail and complexity of the generated prompts.

llm_vision_tempture

A float parameter that influences the randomness of the language model's output. With a default value of 0.8, and a range from -2.0 to 2.0, this parameter allows you to control the creativity of the generated text. Lower values result in more deterministic outputs, while higher values introduce more variability and creativity.

llm_vision_system_prompt

This string parameter allows you to set a system-level prompt that guides the language model's interpretation of the image. It supports multiline and dynamic prompts, providing a default template that can be customized to fit specific needs or themes.

llm_vision_ur_prompt

Similar to the system prompt, this string parameter is used to define a user-level prompt that further refines the language model's output. It also supports multiline and dynamic prompts, allowing for personalized and context-specific text generation.

llm_vision_result_append_enabled

A boolean parameter that determines whether the generated text should be appended to existing results. With a default setting of True, this option allows for continuous and cumulative text generation, which can be toggled off if a standalone output is preferred.

✨ Auto-LLM-Vision Output Parameters:

positive

This output parameter contains the positive conditioning values derived from the image input, which are used to guide the language model in generating relevant and contextually appropriate prompts. It plays a crucial role in ensuring that the generated text aligns with the intended interpretation of the visual content.

negative

The negative output parameter provides conditioning values that help the language model avoid generating irrelevant or undesirable prompts. By balancing positive and negative conditioning, the node ensures a more accurate and focused text generation process.

out_latent

This output parameter includes the latent representations of the image, which are essential for further processing and analysis. The latent data can be used to refine the generated prompts or to feed into other nodes for additional AI-driven tasks.

✨ Auto-LLM-Vision Usage Tips:

  • Experiment with the llm_vision_tempture parameter to find the right balance between creativity and accuracy in the generated prompts. Lower values will produce more predictable results, while higher values can introduce creative variations.
  • Utilize the llm_vision_system_prompt and llm_vision_ur_prompt to tailor the language model's output to specific themes or styles. Customizing these prompts can significantly enhance the relevance and quality of the generated text.
  • Consider the llm_vision_max_token setting when working with complex images that require detailed descriptions. Increasing the token limit can provide more comprehensive prompts, but be mindful of the potential for overly verbose outputs.

✨ Auto-LLM-Vision Common Errors and Solutions:

"Invalid image input"

  • Explanation: This error occurs when the provided image input is not in a supported format or is corrupted.
  • Solution: Ensure that the image is in a compatible format (e.g., JPEG, PNG) and is not damaged. Try re-uploading the image or converting it to a different format.

"Token limit exceeded"

  • Explanation: The generated text exceeds the maximum token limit set by the llm_vision_max_token parameter.
  • Solution: Increase the llm_vision_max_token value to accommodate longer text outputs, or simplify the image input to reduce the complexity of the generated prompts.

"Temperature value out of range"

  • Explanation: The llm_vision_tempture parameter is set outside the allowable range of -2.0 to 2.0.
  • Solution: Adjust the llm_vision_tempture value to fall within the specified range to ensure proper functioning of the language model.

✨ Auto-LLM-Vision Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-decadetw-auto-prompt-llm
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.