Ollama Vision Style Planner:
The OllamaVisionStylePlanner is a sophisticated node designed to leverage a local Ollama vision model for analyzing images and planning subsequent image generation tasks. This node is particularly beneficial for AI artists who wish to create art that is stylistically consistent with a given image. By using a specific vision prompt, the node analyzes the style, composition, lighting, and key elements of an image. It then uses this analysis to inform the generation of new images based on a separate user-provided generation prompt. This dual-prompt approach allows for a nuanced and contextually aware generation process, making it a powerful tool for artists looking to maintain stylistic coherence across their work. The node's ability to integrate vision analysis with generation planning makes it an essential component for artists aiming to produce high-quality, style-consistent digital art.
Ollama Vision Style Planner Input Parameters:
image
This parameter accepts an image input that the Ollama vision model will analyze. The image serves as the basis for style extraction, which influences the subsequent generation process.
vision_prompt
A string input that guides the vision model in analyzing the image. It typically includes instructions to describe the style, composition, lighting, and key elements of the image. This prompt is crucial for extracting the desired stylistic features from the image.
generation_prompt
This string input is used to guide the generation of new images. It allows users to specify the creative direction for the new image, separate from the style extracted from the original image.
ollama_model
Specifies the model to be used for vision analysis. The default value is "llava:7b". This parameter determines the capabilities and performance of the vision analysis.
registry_path
A string that indicates the path to the model registry file, with a default value of "model_registry.json". This file contains information about available models and their configurations.
task_hint
This parameter provides a hint about the type of task to be performed, with options including "auto", "img2img", and "text2img". It helps the node determine the appropriate processing pathway.
user_negative
A string input for specifying negative prompts, which are used to exclude certain elements or styles from the generated image. This can help refine the output by avoiding unwanted features.
aspect_ratio
Defines the aspect ratio of the generated image. Options include "1:1", "3:2", "2:3", "4:3", "3:4", "16:9", "9:16", "21:9", "9:21", "2:1", "1:2", "5:3", "3:5", "4:5", and "5:4". This parameter affects the composition and framing of the output image.
base_size
An integer that sets the base size for the generated image, with a default of 1024 and a range from 256 to 2048, adjustable in steps of 64. This parameter influences the resolution and detail of the output.
ollama_host
Specifies the host address for the Ollama model, defaulting to "localhost". This parameter is important for connecting to the correct server for model execution.
ollama_port
An integer that sets the port number for the Ollama model, with a default of 11434 and a range from 1 to 65535. This parameter is crucial for establishing a successful connection to the model server.
max_vram
Indicates the maximum VRAM available for processing, with options of 24, 16, 12, 8, and 6, and a default of 24. This parameter affects the model's performance and the complexity of tasks it can handle.
Ollama Vision Style Planner Output Parameters:
plan
The output is a detailed plan that includes the selected model, LoRAs, and parameters for generating the new image. This plan is based on the analysis of the input image and the specified prompts, ensuring that the generated image aligns with the desired style and content.
Ollama Vision Style Planner Usage Tips:
- Use a detailed vision prompt to ensure the model accurately captures the style and key elements of the input image, which will enhance the quality of the generated image.
- Adjust the aspect ratio and base size parameters to match the intended use of the generated image, whether for print, digital display, or other purposes.
- Experiment with different generation prompts to explore various creative directions while maintaining the stylistic coherence provided by the vision analysis.
Ollama Vision Style Planner Common Errors and Solutions:
ConnectionError: Unable to connect to Ollama model
- Explanation: This error occurs when the node cannot establish a connection to the Ollama model server, possibly due to incorrect host or port settings.
- Solution: Verify that the
ollama_hostandollama_portparameters are correctly set and that the server is running and accessible.
ValueError: Invalid aspect ratio
- Explanation: This error indicates that an unsupported aspect ratio was provided.
- Solution: Ensure that the
aspect_ratioparameter is set to one of the supported options listed in the input parameters section.
MemoryError: Insufficient VRAM
- Explanation: This error occurs when the task exceeds the available VRAM, often due to high-resolution settings or complex models.
- Solution: Reduce the
base_sizeor select a lowermax_vramoption to fit within the available memory resources.
