Visit ComfyUI Online for ready-to-use ComfyUI environment
Enhance AI art creation with integrated text and vision using large language models for detailed prompts and sophisticated outputs.
The Auto-LLM-Text-Vision node is a powerful extension for ComfyUI designed to seamlessly integrate text and vision capabilities using large language models (LLMs). This node allows you to generate detailed and contextually rich prompts by leveraging both textual and visual inputs, enhancing the creative process in AI art generation. By combining the strengths of text and vision models, it provides a more comprehensive understanding and interpretation of input data, enabling more nuanced and sophisticated outputs. This node is particularly beneficial for artists and creators looking to explore the intersection of language and imagery, offering a versatile tool for generating AI-driven art with enhanced detail and depth.
This parameter accepts an image input that the node will process to extract visual features and context. It serves as the visual component of the input, allowing the node to interpret and integrate visual data into the prompt generation process. The image should be provided in a compatible format, and it acts as a crucial element for vision-based analysis.
This integer parameter defines the maximum number of tokens that the vision model can generate. It controls the length of the output generated from the visual input, with a default value of 50, a minimum of 10, and a maximum of 1024. Adjusting this value can impact the detail and verbosity of the generated output.
A float parameter that influences the randomness of the vision model's output. With a default value of 0.8, it can be adjusted between -2.0 and 2.0 to control the creativity and variability of the generated content. Higher values result in more diverse outputs, while lower values produce more deterministic results.
This string parameter allows you to set a system-level prompt for the vision model. It supports multiline and dynamic prompts, providing a way to guide the model's interpretation of the visual input. The default prompt can be customized to align with specific creative goals or themes.
Similar to the system prompt, this string parameter is used to set a user-level prompt for the vision model. It also supports multiline and dynamic prompts, enabling personalized guidance for the model's output based on user preferences or specific project requirements.
A boolean parameter that determines whether the results from the vision model should be appended to the final output. With a default setting of True, this option allows you to include or exclude vision-based results in the overall output, providing flexibility in how the node's results are utilized.
This string parameter is used to provide a positive text prompt that guides the language model's output. It supports multiline and dynamic prompts, allowing for detailed and specific input that can shape the generated text in a desired direction.
A string parameter for specifying a negative text prompt, which helps the model understand what to avoid in its output. This can be useful for refining the generated content by excluding certain themes or elements.
A boolean parameter that ensures the user's prompt remains a priority in the output generation process. With a default value of True, it helps maintain the integrity and focus of the user's input throughout the model's processing.
This boolean parameter, defaulting to False, allows for recursive use of the model's output as input for further processing. It can be toggled to enable iterative refinement of the generated content, which can be useful for complex or evolving projects.
A string parameter that specifies the API URL for accessing the language model. It is essential for establishing a connection to the model's service and should be set according to the service provider's specifications.
This string parameter is used to provide the API key required for authenticating access to the language model. It is a critical security measure and must be kept confidential to prevent unauthorized access.
A string parameter that indicates the name of the language model to be used. The default model is "llama3.1", but it can be changed to other available models depending on the desired capabilities and features.
The primary output of the Auto-LLM-Text-Vision node is a text-based prompt that integrates both the textual and visual inputs. This output is designed to be rich in detail and context, providing a comprehensive prompt that can be used for further AI art generation or other creative applications. The output reflects the combined interpretation of the input parameters, offering a nuanced and sophisticated result.
llm_vision_tempture
to find the right balance between creativity and consistency in your outputs.llm_vision_system_prompt
and llm_vision_ur_prompt
to guide the model's interpretation of visual inputs, tailoring the output to specific themes or styles.llm_recursive_use
for projects that require iterative refinement, allowing the model to build upon its previous outputs for more complex results.llm_apiurl
is correct and that your network connection is stable. Ensure that the server is operational and accessible.llm_apikey
is valid and correctly entered. Confirm that the language model server is running and that your API key has the necessary permissions.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.