RunComfy

FlashVSR | Real-Time Video Upscaler

Upscale videos fast, smooth, and super clear—no detail lost.

Qwen Edit 2509 MultipleAngles | Multi-View Image Creator

Turn one photo into complete multi-angle visuals instantly.

Sonic | Lip-Sync Portrait Animation

Sonic delivers advanced audio-driven lip-sync for portraits with high-quality animation.

Consistent Character Creator 3.0 | Easy Consistency, Any Angle

Make characters stay the same, every angle, strong and perfect.

ComfyUI > Nodes > VLM_nodes

ComfyUI Extension: VLM_nodes

Repo Name

ComfyUI_VLM_nodes

Author
gokayfem (Account age: 1342 days) Nodes
View all nodes(28) Latest Updated
2025-02-13 Github Stars
0.48K

Github Ask gokayfem Current Questions Past Questions

Table of Content

Description
How VLM_nodes Works
VLM_nodes Features
VLM_nodes Models
Troubleshooting VLM_nodes
Learn More about VLM_nodes
Related Nodes

How to Install VLM_nodes

Install this extension via the ComfyUI Manager by searching for VLM_nodes

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter VLM_nodes in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

VLM_nodes Description

VLM_nodes offers custom nodes for Vision Language Models (VLM) and Large Language Models (LLM), enabling image captioning, automatic prompt generation, creative and consistent prompt suggestions, and keyword extraction.

VLM_nodes Introduction

ComfyUI_VLM_nodes is an extension designed to enhance the capabilities of AI artists by integrating Vision Language Models (VLMs) into the ComfyUI framework. This extension allows you to load and use various VLMs, enabling advanced functionalities such as structured output generation, image-to-music conversion, and automatic prompt generation. By leveraging models like LLaVa, ChatMusician, and InternLM-XComposer2-VL, ComfyUI_VLM_nodes provides a powerful toolset for creating and manipulating AI-generated content, making it easier for artists to achieve their creative goals.

How VLM_nodes Works

ComfyUI_VLM_nodes operates by integrating VLMs into the ComfyUI environment using the llama-cpp-python library. This integration allows the extension to load and utilize models in GGUF format, which are specifically designed for vision-language tasks. The extension works by downloading the necessary model files and clip projectors, placing them in the appropriate directories, and then using these models to process and generate content based on user inputs. The structured output node, for example, can extract entities, numbers, and classify prompts, while the image-to-music feature uses VLMs and LLMs to create music from images.

VLM_nodes Features

Structured Output

The Structured Output node simplifies the process of obtaining reliable answers from VLMs. It can extract entities, numbers, classify prompts, and generate specific prompts. You can customize the output by adding descriptions to fields and selecting the attributes you want to return.

structured

Image to Music

This feature uses VLMs, LLMs, and AudioLDM-2 to create music from images. The SaveAudioNode allows you to save the generated music in the output folder. The necessary files are automatically downloaded into the models/LLavacheckpoints/files_for_audioldm2 directory.

image to music

LLM to Music

Utilizes Chat Musician, an open-source LLM with intrinsic musical abilities, to generate music from text prompts. You can try prompts from the ChatMusician Demo Page. Recommended GGUF files are ChatMusician.Q5_K_M.gguf or ChatMusician.Q5_K_S.gguf.

LLM to music

InternLM-XComposer2-VL Node

This node integrates the InternLM-XComposer2-VL Model using AutoGPTQ. It automatically downloads the necessary files into the models/LLavacheckpoints/files_for_internlm directory. This model is known for its excellent visual perception capabilities.

InternLM-XComposer2

Automatic Prompt Generation and Suggestion Nodes

Get Keyword node: Extracts keywords from LLava outputs.
LLava PromptGenerator node: Creates prompts based on descriptions or keywords.
Suggester node: Generates multiple prompts based on the original prompt, with options for consistent or random results. Automatic Prompt Generation

VLM_nodes Models

Available Models

LlaVa 1.6 Mistral 7B: Model Link
Nous Hermes 2 Vision: Model Link
LlaVa 1.5 7B: Model Link
LlaVa 1.5 13B: Model Link
BakLLaVa: Model Link Each model has its unique capabilities and is suited for different tasks. For example, LlaVa models are excellent for visual question answering and image captioning, while ChatMusician is tailored for generating music from text prompts.

Troubleshooting VLM_nodes

Common Issues and Solutions

Model Loading Errors: Ensure that all model files and clip projectors are correctly placed in the models/LLavacheckpoints directory.
Python Version: Make sure you are using Python 3.9, as this is a requirement for the extension.
File Not Found: Verify that the necessary files are downloaded and placed in the correct directories.

Frequently Asked Questions

Q: What should I do if the music generation fails?
A: Check if the necessary files for AudioLDM-2 are correctly downloaded into the models/LLavacheckpoints/files_for_audioldm2 directory.
Q: How can I improve the creativity of the generated prompts?
A: Adjust the temperature setting in the prompt generation nodes. Higher temperatures result in more creative outputs.

Learn More about VLM_nodes

For additional resources, tutorials, and community support, you can visit the following links:

Awesome VLM Architectures
Prompting Guide for LLM Settings (https://www.promptingguide.ai/introduction/settings) These resources provide in-depth information on Vision Language Models, their architectures, and how to effectively use them within the ComfyUI framework.

VLM_nodes Related Nodes

AudioLDM-2 Node

ChatMusician

Creative Art PromptGenerator

Internlm Node

JsonToText

Get Keywords

Kosmos-2 Node

LLMLoader

LLM PromptGenerator

LLMSampler

LLava Loader Simple

LLava Optional Memory Free Advanced

LLava Optional Memory Free Simple

LLava PromptGenerator

LLava Sampler Advanced

LLava Sampler Simple

Llava Clip Loader

MC-LLaVA Node

MoonDream Node

Moondream-2 Node

PlayMusic Node

API PromptGenerator

Save Audio Node

SimpleText

Structured Output

Suggester

UformGen2 Qwen Node

ViewText

Table of Content

Description
How VLM_nodes Works
VLM_nodes Features
VLM_nodes Models
Troubleshooting VLM_nodes
Learn More about VLM_nodes
Related Nodes

MimicMotion | Human Motion Video Generation

Generate high-quality human motion videos with MimicMotion, using a reference image and motion sequence.

Reallusion AI Render | 3D to ComfyUI Workflows Collection

ComfyUI + Reallusion = Speed, Accessibility, and Ease for 3D visuals

ComfyUI Grounding | Object Tracking Workflow

Track any subject with pixel-perfect accuracy for stunning VFX results.

Z-Image Finetuned Models Collection | Multi-Style Generator

Create stunning, detailed images across multiple styles and moods easily.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Extension: VLM_nodes

ComfyUI_VLM_nodes

How to Install VLM_nodes

VLM_nodes Description

VLM_nodes Introduction

How VLM_nodes Works

VLM_nodes Features

Structured Output

Image to Music

LLM to Music

InternLM-XComposer2-VL Node

Automatic Prompt Generation and Suggestion Nodes

VLM_nodes Models

Available Models

Troubleshooting VLM_nodes

Common Issues and Solutions

Frequently Asked Questions

Learn More about VLM_nodes

VLM_nodes Related Nodes