ComfyUI > Nodes > ComfyUI-HunyuanVideoWrapper > HunyuanVideo TextImageEncode (IP2V)

ComfyUI Node: HunyuanVideo TextImageEncode (IP2V)

Class Name

HyVideoTextImageEncode

Category
HunyuanVideoWrapper
Author
kijai (Account age: 2506days)
Extension
ComfyUI-HunyuanVideoWrapper
Latest Updated
2025-05-12
Github Stars
2.4K

How to Install ComfyUI-HunyuanVideoWrapper

Install this extension via the ComfyUI Manager by searching for ComfyUI-HunyuanVideoWrapper
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-HunyuanVideoWrapper in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

HunyuanVideo TextImageEncode (IP2V) Description

Convert image prompts to video using VLM for AI artists, bridging image-text gap for dynamic video creation.

HunyuanVideo TextImageEncode (IP2V):

The HyVideoTextImageEncode node is an advanced tool designed to facilitate the conversion of image prompts into video content using a Video Language Model (VLM) implementation. This experimental feature, developed by @Dango233, leverages the capabilities of both text and image inputs to generate video outputs, making it a powerful asset for AI artists looking to create dynamic visual content from static images and text descriptions. The node's primary goal is to bridge the gap between image prompts and video generation, offering a seamless integration of visual and textual data to produce rich, engaging video content. By utilizing this node, you can explore new creative possibilities, transforming your artistic vision into animated sequences with ease.

HunyuanVideo TextImageEncode (IP2V) Input Parameters:

text_encoders

This parameter specifies the text encoders to be used in the process. It is crucial for interpreting the textual input and converting it into a format that can be integrated with image data to generate video content. The text encoders play a vital role in ensuring that the semantic meaning of the text is accurately captured and represented in the video output.

prompt

The prompt parameter is a string input that allows you to provide a detailed description or narrative that guides the video generation process. This input can be multiline, enabling you to craft complex and nuanced prompts that influence the final video output. The prompt serves as the foundation for the video's thematic and narrative elements.

force_offload

This optional boolean parameter, with a default value of True, determines whether certain models should be offloaded to a different device before encoding. This can help manage computational resources more efficiently, especially when working with large models or limited hardware capabilities.

prompt_template

The prompt_template parameter offers a selection of predefined templates, such as I2V_video, I2V_image, or disabled, with I2V_video as the default. These templates provide a structured framework for the text encoder, ensuring consistency and coherence in the video generation process. The tooltip suggests using these templates to optimize the integration of text and image data.

clip_l

This parameter allows you to use a comfy clip model instead of the default text encoder. It is particularly useful when you want to leverage the capabilities of a specific clip model for text processing. The tooltip advises disabling the text encoder loader's clip_l when using this option to avoid conflicts.

image

The image parameter is an optional input that accepts an image file to be used as a prompt for video generation. By incorporating an image, you can enhance the visual richness of the video output, providing a concrete visual reference that complements the textual prompt.

hyvid_cfg

This parameter is used to specify the configuration settings for the HunyuanVideo model. It allows you to customize various aspects of the video generation process, tailoring the output to meet your specific creative needs.

image_embed_interleave

The image_embed_interleave parameter, with a default value of 2, controls the degree of interleaving between image and text embeddings. This setting influences how much the image impacts the video generation compared to the text prompt. Adjusting this value can help you achieve the desired balance between visual and textual elements in the final video.

model_to_offload

This parameter specifies the model to be moved to an offload device before encoding. It is particularly useful for managing computational resources and ensuring efficient processing, especially when working with large models or limited hardware capabilities.

HunyuanVideo TextImageEncode (IP2V) Output Parameters:

hyvid_embeds

The hyvid_embeds output parameter represents the encoded video embeddings generated by the node. These embeddings are a crucial component of the video generation process, encapsulating the combined information from both the text and image inputs. The embeddings serve as the foundation for creating the final video output, ensuring that the semantic and visual elements are accurately represented.

HunyuanVideo TextImageEncode (IP2V) Usage Tips:

  • Experiment with different prompt_template options to see how they affect the integration of text and image data in the video output. This can help you find the best template for your specific creative vision.
  • Adjust the image_embed_interleave parameter to fine-tune the balance between the influence of the image and the text prompt on the final video. A higher value will give more weight to the text, while a lower value will emphasize the image.

HunyuanVideo TextImageEncode (IP2V) Common Errors and Solutions:

Model not found for offloading

  • Explanation: This error occurs when the specified model for offloading is not available or incorrectly configured.
  • Solution: Ensure that the model specified in the model_to_offload parameter is correctly installed and accessible. Check the configuration settings to verify that the model path and device settings are correct.

Incompatible text encoder

  • Explanation: This error arises when there is a mismatch between the selected text encoder and the input data or configuration.
  • Solution: Verify that the text_encoders parameter is set to a compatible encoder for your input data. If using a custom clip model, ensure that the clip_l parameter is correctly configured and that the default text encoder is disabled if necessary.

HunyuanVideo TextImageEncode (IP2V) Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-HunyuanVideoWrapper
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.