Reference-driven video generation using Wan2.1 14B

EchoMimic | Audio-driven Portrait Animations

Generate realistic talking heads and body gestures synced with the provided audio.

MultiTalk | Photo to Talking Video

Millisecond lip sync + Wan2.1 = 15s ultra-detailed talking videos!

Era3D | ComfyUI 3D Pack

Generate 3D content, from multi-view images to detailed meshes.

ComfyUI > Nodes > ComfyUI-decadetw-auto-prompt-llm > ✨ Auto-LLM-Text-Vision

ComfyUI Node: ✨ Auto-LLM-Text-Vision

Class Name

Auto-LLM-Text-Vision

Category
🧩 Auto-Prompt-LLM

Author
xlinx (Account age: 4822days) Extension
ComfyUI-decadetw-auto-prompt-llm Latest Updated
2025-02-01 Github Stars
0.02K

Github Ask xlinx Current Questions Past Questions

Table of Content

Description
Auto-LLM-Text-Vision:
Auto-LLM-Text-Vision Input Parameters:
Auto-LLM-Text-Vision Output Parameters:
Auto-LLM-Text-Vision Usage Tips:
Auto-LLM-Text-Vision Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-decadetw-auto-prompt-llm

Install this extension via the ComfyUI Manager by searching for ComfyUI-decadetw-auto-prompt-llm

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-decadetw-auto-prompt-llm in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

✨ Auto-LLM-Text-Vision Description

Enhance AI art creation with integrated text and vision using large language models for detailed prompts and sophisticated outputs.

✨ Auto-LLM-Text-Vision:

The Auto-LLM-Text-Vision node is a powerful extension for ComfyUI designed to seamlessly integrate text and vision capabilities using large language models (LLMs). This node allows you to generate detailed and contextually rich prompts by leveraging both textual and visual inputs, enhancing the creative process in AI art generation. By combining the strengths of text and vision models, it provides a more comprehensive understanding and interpretation of input data, enabling more nuanced and sophisticated outputs. This node is particularly beneficial for artists and creators looking to explore the intersection of language and imagery, offering a versatile tool for generating AI-driven art with enhanced detail and depth.

✨ Auto-LLM-Text-Vision Input Parameters:

image_to_llm_vision

This parameter accepts an image input that the node will process to extract visual features and context. It serves as the visual component of the input, allowing the node to interpret and integrate visual data into the prompt generation process. The image should be provided in a compatible format, and it acts as a crucial element for vision-based analysis.

llm_vision_max_token

This integer parameter defines the maximum number of tokens that the vision model can generate. It controls the length of the output generated from the visual input, with a default value of 50, a minimum of 10, and a maximum of 1024. Adjusting this value can impact the detail and verbosity of the generated output.

llm_vision_tempture

A float parameter that influences the randomness of the vision model's output. With a default value of 0.8, it can be adjusted between -2.0 and 2.0 to control the creativity and variability of the generated content. Higher values result in more diverse outputs, while lower values produce more deterministic results.

llm_vision_system_prompt

This string parameter allows you to set a system-level prompt for the vision model. It supports multiline and dynamic prompts, providing a way to guide the model's interpretation of the visual input. The default prompt can be customized to align with specific creative goals or themes.

llm_vision_ur_prompt

Similar to the system prompt, this string parameter is used to set a user-level prompt for the vision model. It also supports multiline and dynamic prompts, enabling personalized guidance for the model's output based on user preferences or specific project requirements.

llm_vision_result_append_enabled

A boolean parameter that determines whether the results from the vision model should be appended to the final output. With a default setting of True, this option allows you to include or exclude vision-based results in the overall output, providing flexibility in how the node's results are utilized.

text_prompt_postive

This string parameter is used to provide a positive text prompt that guides the language model's output. It supports multiline and dynamic prompts, allowing for detailed and specific input that can shape the generated text in a desired direction.

text_prompt_negative

A string parameter for specifying a negative text prompt, which helps the model understand what to avoid in its output. This can be useful for refining the generated content by excluding certain themes or elements.

llm_keep_your_prompt_ahead

A boolean parameter that ensures the user's prompt remains a priority in the output generation process. With a default value of True, it helps maintain the integrity and focus of the user's input throughout the model's processing.

llm_recursive_use

This boolean parameter, defaulting to False, allows for recursive use of the model's output as input for further processing. It can be toggled to enable iterative refinement of the generated content, which can be useful for complex or evolving projects.

llm_apiurl

A string parameter that specifies the API URL for accessing the language model. It is essential for establishing a connection to the model's service and should be set according to the service provider's specifications.

llm_apikey

This string parameter is used to provide the API key required for authenticating access to the language model. It is a critical security measure and must be kept confidential to prevent unauthorized access.

llm_api_model_name

A string parameter that indicates the name of the language model to be used. The default model is "llama3.1", but it can be changed to other available models depending on the desired capabilities and features.

✨ Auto-LLM-Text-Vision Output Parameters:

Generated Output

The primary output of the Auto-LLM-Text-Vision node is a text-based prompt that integrates both the textual and visual inputs. This output is designed to be rich in detail and context, providing a comprehensive prompt that can be used for further AI art generation or other creative applications. The output reflects the combined interpretation of the input parameters, offering a nuanced and sophisticated result.

✨ Auto-LLM-Text-Vision Usage Tips:

Experiment with different values for llm_vision_tempture to find the right balance between creativity and consistency in your outputs.
Use the llm_vision_system_prompt and llm_vision_ur_prompt to guide the model's interpretation of visual inputs, tailoring the output to specific themes or styles.
Enable llm_recursive_use for projects that require iterative refinement, allowing the model to build upon its previous outputs for more complex results.

✨ Auto-LLM-Text-Vision Common Errors and Solutions:

LLM SERVER not found

Explanation: This error indicates that the node is unable to connect to the specified language model server, possibly due to incorrect API URL or network issues.
Solution: Verify that the llm_apiurl is correct and that your network connection is stable. Ensure that the server is operational and accessible.

Missing LLM-Text

Explanation: This error occurs when the node fails to retrieve text output from the language model, potentially due to an invalid API key or server issues.
Solution: Check that the llm_apikey is valid and correctly entered. Confirm that the language model server is running and that your API key has the necessary permissions.

✨ Auto-LLM-Text-Vision Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-decadetw-auto-prompt-llm

Table of Content

Description
Auto-LLM-Text-Vision:
Auto-LLM-Text-Vision Input Parameters:
Auto-LLM-Text-Vision Output Parameters:
Auto-LLM-Text-Vision Usage Tips:
Auto-LLM-Text-Vision Common Errors and Solutions:
Related Nodes

FLUX LoRA Training

Guide you through the entire process of training FLUX LoRA models using your custom datasets.

MV-Adapter | High-Resolution Multi-view Generator

Generate 360-degree views of anything from a single image or description.

FLUX Inpainting | Seamless Image Editing

Effortlessly fill, remove, and refine images, seamlessly integrating new content.

Wan 2.1 Fun | I2V + T2V

Empower your AI videos with Wan 2.1 Fun.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.