ComfyUI > Nodes > ComfyUI-decadetw-auto-prompt-llm > ✨ Auto-LLM-Text-Vision

ComfyUI Node: ✨ Auto-LLM-Text-Vision

Class Name

Auto-LLM-Text-Vision

Category
🧩 Auto-Prompt-LLM
Author
xlinx (Account age: 4822days)
Extension
ComfyUI-decadetw-auto-prompt-llm
Latest Updated
2025-02-01
Github Stars
0.02K

How to Install ComfyUI-decadetw-auto-prompt-llm

Install this extension via the ComfyUI Manager by searching for ComfyUI-decadetw-auto-prompt-llm
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-decadetw-auto-prompt-llm in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

✨ Auto-LLM-Text-Vision Description

Enhance AI art creation with integrated text and vision using large language models for detailed prompts and sophisticated outputs.

✨ Auto-LLM-Text-Vision:

The Auto-LLM-Text-Vision node is a powerful extension for ComfyUI designed to seamlessly integrate text and vision capabilities using large language models (LLMs). This node allows you to generate detailed and contextually rich prompts by leveraging both textual and visual inputs, enhancing the creative process in AI art generation. By combining the strengths of text and vision models, it provides a more comprehensive understanding and interpretation of input data, enabling more nuanced and sophisticated outputs. This node is particularly beneficial for artists and creators looking to explore the intersection of language and imagery, offering a versatile tool for generating AI-driven art with enhanced detail and depth.

✨ Auto-LLM-Text-Vision Input Parameters:

image_to_llm_vision

This parameter accepts an image input that the node will process to extract visual features and context. It serves as the visual component of the input, allowing the node to interpret and integrate visual data into the prompt generation process. The image should be provided in a compatible format, and it acts as a crucial element for vision-based analysis.

llm_vision_max_token

This integer parameter defines the maximum number of tokens that the vision model can generate. It controls the length of the output generated from the visual input, with a default value of 50, a minimum of 10, and a maximum of 1024. Adjusting this value can impact the detail and verbosity of the generated output.

llm_vision_tempture

A float parameter that influences the randomness of the vision model's output. With a default value of 0.8, it can be adjusted between -2.0 and 2.0 to control the creativity and variability of the generated content. Higher values result in more diverse outputs, while lower values produce more deterministic results.

llm_vision_system_prompt

This string parameter allows you to set a system-level prompt for the vision model. It supports multiline and dynamic prompts, providing a way to guide the model's interpretation of the visual input. The default prompt can be customized to align with specific creative goals or themes.

llm_vision_ur_prompt

Similar to the system prompt, this string parameter is used to set a user-level prompt for the vision model. It also supports multiline and dynamic prompts, enabling personalized guidance for the model's output based on user preferences or specific project requirements.

llm_vision_result_append_enabled

A boolean parameter that determines whether the results from the vision model should be appended to the final output. With a default setting of True, this option allows you to include or exclude vision-based results in the overall output, providing flexibility in how the node's results are utilized.

text_prompt_postive

This string parameter is used to provide a positive text prompt that guides the language model's output. It supports multiline and dynamic prompts, allowing for detailed and specific input that can shape the generated text in a desired direction.

text_prompt_negative

A string parameter for specifying a negative text prompt, which helps the model understand what to avoid in its output. This can be useful for refining the generated content by excluding certain themes or elements.

llm_keep_your_prompt_ahead

A boolean parameter that ensures the user's prompt remains a priority in the output generation process. With a default value of True, it helps maintain the integrity and focus of the user's input throughout the model's processing.

llm_recursive_use

This boolean parameter, defaulting to False, allows for recursive use of the model's output as input for further processing. It can be toggled to enable iterative refinement of the generated content, which can be useful for complex or evolving projects.

llm_apiurl

A string parameter that specifies the API URL for accessing the language model. It is essential for establishing a connection to the model's service and should be set according to the service provider's specifications.

llm_apikey

This string parameter is used to provide the API key required for authenticating access to the language model. It is a critical security measure and must be kept confidential to prevent unauthorized access.

llm_api_model_name

A string parameter that indicates the name of the language model to be used. The default model is "llama3.1", but it can be changed to other available models depending on the desired capabilities and features.

✨ Auto-LLM-Text-Vision Output Parameters:

Generated Output

The primary output of the Auto-LLM-Text-Vision node is a text-based prompt that integrates both the textual and visual inputs. This output is designed to be rich in detail and context, providing a comprehensive prompt that can be used for further AI art generation or other creative applications. The output reflects the combined interpretation of the input parameters, offering a nuanced and sophisticated result.

✨ Auto-LLM-Text-Vision Usage Tips:

  • Experiment with different values for llm_vision_tempture to find the right balance between creativity and consistency in your outputs.
  • Use the llm_vision_system_prompt and llm_vision_ur_prompt to guide the model's interpretation of visual inputs, tailoring the output to specific themes or styles.
  • Enable llm_recursive_use for projects that require iterative refinement, allowing the model to build upon its previous outputs for more complex results.

✨ Auto-LLM-Text-Vision Common Errors and Solutions:

LLM SERVER not found

  • Explanation: This error indicates that the node is unable to connect to the specified language model server, possibly due to incorrect API URL or network issues.
  • Solution: Verify that the llm_apiurl is correct and that your network connection is stable. Ensure that the server is operational and accessible.

Missing LLM-Text

  • Explanation: This error occurs when the node fails to retrieve text output from the language model, potentially due to an invalid API key or server issues.
  • Solution: Check that the llm_apikey is valid and correctly entered. Confirm that the language model server is running and that your API key has the necessary permissions.

✨ Auto-LLM-Text-Vision Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-decadetw-auto-prompt-llm
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.